Enhancement request for pg_dump

Started by Sergei Agalakovalmost 10 years ago23 messagesgeneral

Sergei.Agalakov@getmyle.com

almost 10 years ago

Hi,

Currently as in PG 9.4, 9.5 the order of the statements in the script
produced by pg_dump is uncertain even for the same versions of the
databases and pg_dump.
One database may script grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT ON TABLE contracttype TO mro;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;

and the other may change the order of grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;
GRANT SELECT ON TABLE contracttype TO mro;

It complicates the usage of pg_dump to compare the structures of the two
similar databases like DEV and PROD, two development branches etc.
If the order of the statements generated by pg_dump would be guaranteed
then it will be very easy to compare the structures and
security rights of the two databases using only pg_dump and a diff/merge
tool. Currently we encounter a lot of false differences.
A sorted order of the DDL and DCL statements in a dump can be
implemented as a flag to pg_dump or even better as a default behavior.

Thank you,

Sergei Agalakov

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Karsten Hilbert

Karsten.Hilbert@gmx.net

almost 10 years ago

In reply to: Sergei Agalakov (#1)

Re: Enhancement request for pg_dump

On Sat, Apr 16, 2016 at 01:33:21PM -0600, Sergei Agalakov wrote:

Currently as in PG 9.4, 9.5 the order of the statements in the script
produced by pg_dump is uncertain even for the same versions of the databases
and pg_dump.
One database may script grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT ON TABLE contracttype TO mro;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;

and the other may change the order of grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;
GRANT SELECT ON TABLE contracttype TO mro;

It complicates the usage of pg_dump to compare the structures of the two
similar databases like DEV and PROD, two development branches etc.
If the order of the statements generated by pg_dump would be guaranteed then
it will be very easy to compare the structures and
security rights of the two databases using only pg_dump and a diff/merge
tool. Currently we encounter a lot of false differences.
A sorted order of the DDL and DCL statements in a dump can be implemented as
a flag to pg_dump or even better as a default behavior.

Since the actual order of statements inside the text mode
dump file does not matter (no restore is being attempted) --
rather only that the order is predictable -- would it not
suffice to run the two dumps through a generic text sort
program ?

pg_dump -D DEV ... | sort > broken-but-sorted-dump-1.txt
pg_dump -D PROD ... | sort > broken-but-sorted-dump-2.txt
diff ... broken-but-sorted-dump-1.txt broken-but-sorted-dump-2.txt

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Bill Moran

wmoran@potentialtech.com

almost 10 years ago

In reply to: Sergei Agalakov (#1)

Re: Enhancement request for pg_dump

On Sat, 16 Apr 2016 13:33:21 -0600
Sergei Agalakov <Sergei.Agalakov@getmyle.com> wrote:

Hi,

Currently as in PG 9.4, 9.5 the order of the statements in the script
produced by pg_dump is uncertain even for the same versions of the
databases and pg_dump.
One database may script grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT ON TABLE contracttype TO mro;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;

and the other may change the order of grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;
GRANT SELECT ON TABLE contracttype TO mro;

It complicates the usage of pg_dump to compare the structures of the two
similar databases like DEV and PROD, two development branches etc.

I don't think pg_dump was ever intended to serve that purpose.

dbsteward, on the other hand, does what you want:
https://github.com/nkiraly/DBSteward/wiki

--
Bill Moran

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Sergei Agalakov

Sergei.Agalakov@getmyle.com

almost 10 years ago

In reply to: Karsten Hilbert (#2)

It can be done of course, but as you can see in my examples the
statements in pg_dump generated scripts are grouped together by the objects.
It is easier to analyze the differences when all these differences for
an object are clustered together, and aren't dispersed in the diff file.
It also will break the multi-line statements.
It is also possible to write a more complex Perl script, or sed/awk, but
it would be so much easier to do it directly in pg_dump.

Sergei

Show quoted text

Currently as in PG 9.4, 9.5 the order of the statements in the script
produced by pg_dump is uncertain even for the same versions of the databases
and pg_dump.
One database may script grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT ON TABLE contracttype TO mro;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;

and the other may change the order of grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;
GRANT SELECT ON TABLE contracttype TO mro;

It complicates the usage of pg_dump to compare the structures of the two
similar databases like DEV and PROD, two development branches etc.
If the order of the statements generated by pg_dump would be guaranteed then
it will be very easy to compare the structures and
security rights of the two databases using only pg_dump and a diff/merge
tool. Currently we encounter a lot of false differences.
A sorted order of the DDL and DCL statements in a dump can be implemented as
a flag to pg_dump or even better as a default behavior.

Since the actual order of statements inside the text mode
dump file does not matter (no restore is being attempted) --
rather only that the order is predictable -- would it not
suffice to run the two dumps through a generic text sort
program ?

pg_dump -D DEV ... | sort > broken-but-sorted-dump-1.txt
pg_dump -D PROD ... | sort > broken-but-sorted-dump-2.txt
diff ... broken-but-sorted-dump-1.txt broken-but-sorted-dump-2.txt

Karsten
--
GPG key ID E4071346 @eu.pool.sks-keyservers.net <http://eu.pool.sks-keyservers.net/>
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

Sergei Agalakov

Sergei.Agalakov@getmyle.com

almost 10 years ago

In reply to: Bill Moran (#3)

20160417105248.d20dcefed39b5d9031c6b28d@potentialtech.com

I know about DBSteward. I don't like to bring PHP infrastructure only to
be able to compare two dumps,
and to deal with potential bugs in the third party tools. The pg_dump in
other hand is always here, and is always trusted.
SQLWorkbench/J also can compare two schemas, and requires only Java.
Again, I trust pg_dump more.
http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files with the
determined order of the statements,
but it CAN do it with the minor changes, and be more useful to
administrators. Why rely on the third party tools
for the tasks that can be done with the native, trusted tools?

Sergei

Hi,

Currently as in PG 9.4, 9.5 the order of the statements in the script
produced by pg_dump is uncertain even for the same versions of the
databases and pg_dump.
One database may script grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT ON TABLE contracttype TO mro;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;

and the other may change the order of grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;
GRANT SELECT ON TABLE contracttype TO mro;

It complicates the usage of pg_dump to compare the structures of the

two

similar databases like DEV and PROD, two development branches etc.

I don't think pg_dump was ever intended to serve that purpose.

dbsteward, on the other hand, does what you want:
https://github.com/nkiraly/DBSteward/wiki

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Rob Sargent

robjsargent@gmail.com

almost 10 years ago

In reply to: Sergei Agalakov (#5)

Re: 20160417105248.d20dcefed39b5d9031c6b28d@potentialtech.com

On Apr 17, 2016, at 12:41 PM, Sergei Agalakov <Sergei.Agalakov@getmyle.com> wrote:

I know about DBSteward. I don't like to bring PHP infrastructure only to be able to compare two dumps,
and to deal with potential bugs in the third party tools. The pg_dump in other hand is always here, and is always trusted.
SQLWorkbench/J also can compare two schemas, and requires only Java. Again, I trust pg_dump more.
http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files with the determined order of the statements,
but it CAN do it with the minor changes, and be more useful to administrators. Why rely on the third party tools
for the tasks that can be done with the native, trusted tools?

Sergei

Does it matter if they differ if you cannot recreate the correct one exactly from source-controllled DDL? Or know how they are supposed to differ if this is a migration point?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Sergei Agalakov

Sergei.Agalakov@getmyle.com

almost 10 years ago

In reply to: Rob Sargent (#6)

Re: Enhancement request for pg_dump

I don't see how these questions are related to the proposed pg_dump
improvement.
I suggest to improve pg_dump so it can be used instead of the third
party tools like DBSteward and SQLWorkbench/J etc.
to compare two different databases or existing dumps, and to identify
the differences. The use cases will be exactly
the same as for the third party tools. The positive difference will be
that pg_dump is a very reliable, always available and supports all the
latest PostgreSQL features.
Do you imply that there shouldn't be any reasons to compare different
databases to find the differences between them?

Sergei

On Apr 17, 2016, at 12:41 PM, Sergei Agalakov <Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I know about DBSteward. I don't like to bring PHP infrastructure only to be able to compare two dumps,
and to deal with potential bugs in the third party tools. The pg_dump in other hand is always here, and is always trusted.
SQLWorkbench/J also can compare two schemas, and requires only Java. Again, I trust pg_dump more.
http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files with the determined order of the statements,
but it CAN do it with the minor changes, and be more useful to administrators. Why rely on the third party tools
for the tasks that can be done with the native, trusted tools?

Sergei

Does it matter if they differ if you cannot recreate the correct one exactly from source-controllled DDL? Or know how they are supposed to differ if this is a migration point?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Bill Moran

wmoran@potentialtech.com

almost 10 years ago

In reply to: Sergei Agalakov (#7)

Re: Enhancement request for pg_dump

On Sun, 17 Apr 2016 14:10:50 -0600
Sergei Agalakov <Sergei.Agalakov@getmyle.com> wrote:

I don't see how these questions are related to the proposed pg_dump
improvement.
I suggest to improve pg_dump so it can be used instead of the third
party tools like DBSteward and SQLWorkbench/J etc.
to compare two different databases or existing dumps, and to identify
the differences. The use cases will be exactly
the same as for the third party tools. The positive difference will be
that pg_dump is a very reliable, always available and supports all the
latest PostgreSQL features.
Do you imply that there shouldn't be any reasons to compare different
databases to find the differences between them?

Nobody has weighed in on this, but I have a theory ...

I (personally) worry that adding features like you suggest to pg_dump
would interfere with its ability to perform complete dump of a large
database in a _rapid_ manner. Using pg_dump as a backup tool has an
inherent desire for the tool to be as fast and low-impact on the
operation of the database as possible.

Features that would force pg_dump to care about ordering that isn't
necessary to its core functionality of providing a reliable backup
are liable to slow it down. They might also overcomplicate it, making
it more difficult to maintain reliably.

When you consider that possibility, and the fact that pg_dump isn't
_supposed_ to be a tool to help you with schema maintenance, it's easy
to see why someone would look for different approach to the problem.

And I feel that's what all the answers have attempted to do: suggest
ways to get what you want without asking them to be implemented in a
tool that isn't really the right place for them anyway. While your
arguments toward making this change are valid, I'm not sure that
they are compelling enough to justify adding a feature where it
doesn't really belong.

Another side to this, is that your request suggests that your
development process is suboptimal. Of course, I can't be 100% sure
since you haven't explained your process ... but my experience is
that people who feel the need to automagically sync prod and dev
databases have a suboptimal development process. Thus, the suggestions
are also biased toward helping you improve your process instead of
adjusting a tool to better support a suboptimal process.

Of course, if the people actually doing the work on the code disagree
with me, then they'll make the change. I'm just expressing an opinion.

Sergei

On Apr 17, 2016, at 12:41 PM, Sergei Agalakov <Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I know about DBSteward. I don't like to bring PHP infrastructure only to be able to compare two dumps,
and to deal with potential bugs in the third party tools. The pg_dump in other hand is always here, and is always trusted.
SQLWorkbench/J also can compare two schemas, and requires only Java. Again, I trust pg_dump more.
http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files with the determined order of the statements,
but it CAN do it with the minor changes, and be more useful to administrators. Why rely on the third party tools
for the tasks that can be done with the native, trusted tools?

Sergei

Does it matter if they differ if you cannot recreate the correct one exactly from source-controllled DDL? Or know how they are supposed to differ if this is a migration point?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Bill Moran

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Adrian Klaver

adrian.klaver@aklaver.com

almost 10 years ago

In reply to: Sergei Agalakov (#7)

Re: Enhancement request for pg_dump

On 04/17/2016 01:10 PM, Sergei Agalakov wrote:

I don't see how these questions are related to the proposed pg_dump
improvement.
I suggest to improve pg_dump so it can be used instead of the third
party tools like DBSteward and SQLWorkbench/J etc.
to compare two different databases or existing dumps, and to identify
the differences. The use cases will be exactly
the same as for the third party tools. The positive difference will be
that pg_dump is a very reliable, always available and supports all the
latest PostgreSQL features.
Do you imply that there shouldn't be any reasons to compare different
databases to find the differences between them?

I don't think that is what is being said, more the right tool for the
right job. pg_dump --> pg_restore/psql are for capturing some or all of
the information(including possibly data) in a database at a point in
time so that information can be recreated at another point in time.
While by necessity that includes capturing schema data (or not in the
case of -a) it is not really a schema comparison tool. While that is in
the realm of doable it means developer time to replicate something that
other tools do. Given the ever lengthening list of requested features in
general, this request might have difficulty reaching a sufficient level
of priority, esp. in light of the presence of existing tools.

Sergei

On Apr 17, 2016, at 12:41 PM, Sergei Agalakov

<Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I know about DBSteward. I don't like to bring PHP infrastructure

only to be able to compare two dumps,

and to deal with potential bugs in the third party tools. The

pg_dump in other hand is always here, and is always trusted.

SQLWorkbench/J also can compare two schemas, and requires only Java.

Again, I trust pg_dump more.

http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files with

the determined order of the statements,

but it CAN do it with the minor changes, and be more useful to

administrators. Why rely on the third party tools

for the tasks that can be done with the native, trusted tools?

Sergei

Does it matter if they differ if you cannot recreate the correct one
exactly from source-controllled DDL? Or know how they are supposed to
differ if this is a migration point?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10

Adrian Klaver

adrian.klaver@aklaver.com

almost 10 years ago

In reply to: Sergei Agalakov (#7)

Re: Enhancement request for pg_dump

On 04/17/2016 01:10 PM, Sergei Agalakov wrote:

I don't see how these questions are related to the proposed pg_dump
improvement.
I suggest to improve pg_dump so it can be used instead of the third
party tools like DBSteward and SQLWorkbench/J etc.
to compare two different databases or existing dumps, and to identify
the differences. The use cases will be exactly
the same as for the third party tools. The positive difference will be
that pg_dump is a very reliable, always available and supports all the
latest PostgreSQL features.
Do you imply that there shouldn't be any reasons to compare different
databases to find the differences between them?

To follow up my previous post and to illustrate some of the
difficulties, from your original post:

"One database may script grants like

and the other may change the order of grants like

From the perspective of the database both of the above lead to the same
end result, so order is not important. Of course a diff is going to see
it differently. The solution is then to impose an order, but how would
that be determined? For instance what about:

GRANT SELECT ON TABLE contracttype TO mro;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;

GRANT SELECT ON TABLE contracttype TO mro;
GRANT INSERT, SELECT, DELETE,UPDATE ON TABLE contracttype TO musers;

Again diff will see them as not the same, but functionally they are the
same. So who decides order and how far do you reach down into the
statements?

As Bill wrote the issue is after the fact version control versus before
the fact version control. Trying to match things up after various people
have been turned loose at will on different instances of databases is
much more difficult then having them go through a structured version
control system first.

Sergei

On Apr 17, 2016, at 12:41 PM, Sergei Agalakov

<Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I know about DBSteward. I don't like to bring PHP infrastructure

only to be able to compare two dumps,

and to deal with potential bugs in the third party tools. The

pg_dump in other hand is always here, and is always trusted.

SQLWorkbench/J also can compare two schemas, and requires only Java.

Again, I trust pg_dump more.

http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files with

the determined order of the statements,

but it CAN do it with the minor changes, and be more useful to

administrators. Why rely on the third party tools

for the tasks that can be done with the native, trusted tools?

Sergei

Does it matter if they differ if you cannot recreate the correct one
exactly from source-controllled DDL? Or know how they are supposed to
differ if this is a migration point?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11

Adrian Klaver

adrian.klaver@aklaver.com

almost 10 years ago

In reply to: Adrian Klaver (#10)

Re: Enhancement request for pg_dump

On 04/17/2016 01:58 PM, Adrian Klaver wrote:

On 04/17/2016 01:10 PM, Sergei Agalakov wrote:

I don't see how these questions are related to the proposed pg_dump
improvement.
I suggest to improve pg_dump so it can be used instead of the third
party tools like DBSteward and SQLWorkbench/J etc.
to compare two different databases or existing dumps, and to identify
the differences. The use cases will be exactly
the same as for the third party tools. The positive difference will be
that pg_dump is a very reliable, always available and supports all the
latest PostgreSQL features.
Do you imply that there shouldn't be any reasons to compare different
databases to find the differences between them?

To follow up my previous post and to illustrate some of the
difficulties, from your original post:

"One database may script grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT ON TABLE contracttype TO mro;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;

and the other may change the order of grants like

REVOKE ALL ON TABLE contracttype FROM PUBLIC;
REVOKE ALL ON TABLE contracttype FROM madmin;
GRANT ALL ON TABLE contracttype TO madmin;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;
GRANT SELECT ON TABLE contracttype TO mro;
"

From the perspective of the database both of the above lead to the same
end result, so order is not important. Of course a diff is going to see
it differently. The solution is then to impose an order, but how would
that be determined? For instance what about:

GRANT SELECT ON TABLE contracttype TO mro;
GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE contracttype TO musers;

vs

GRANT SELECT ON TABLE contracttype TO mro;
GRANT INSERT, SELECT, DELETE,UPDATE ON TABLE contracttype TO musers;

Again diff will see them as not the same, but functionally they are the
same. So who decides order and how far do you reach down into the
statements?

Bad example. Some testing shows Postgres will reorder the GRANTS as:

SELECT,INSERT,DELETE,UPDATE

from whatever order they where entered as.

As Bill wrote the issue is after the fact version control versus before
the fact version control. Trying to match things up after various people
have been turned loose at will on different instances of databases is
much more difficult then having them go through a structured version
control system first.

Sergei

On Apr 17, 2016, at 12:41 PM, Sergei Agalakov

<Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I know about DBSteward. I don't like to bring PHP infrastructure

only to be able to compare two dumps,

and to deal with potential bugs in the third party tools. The

pg_dump in other hand is always here, and is always trusted.

SQLWorkbench/J also can compare two schemas, and requires only Java.

Again, I trust pg_dump more.

http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files with

the determined order of the statements,

but it CAN do it with the minor changes, and be more useful to

administrators. Why rely on the third party tools

for the tasks that can be done with the native, trusted tools?

Sergei

Does it matter if they differ if you cannot recreate the correct one
exactly from source-controllled DDL? Or know how they are supposed to
differ if this is a migration point?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12

Sergei Agalakov

Sergei.Agalakov@getmyle.com

almost 10 years ago

In reply to: Bill Moran (#8)

Re: Enhancement request for pg_dump

I hardly can see that a sorting of the grants by users will create a
measurable impact on the pg_dump performance in a real database.
One can imaging a database with tens of thousands of objects and tens of
thousands of users and almost no data, but it would be quite unusual.
Anyway, if a sorting behavior is initiated by a command line parameter,
and isn't a default behavior of pg_dump then this argument doesn't work.
After all pg_dump isn't the tool to do _just_ reliable backup. It can be
used for migration, it can be used for schema cloning, to initiate a
standby...
There are many flags for pg_dump that are absolutely unnecessary for
full database backup. So they do
"... might also overcomplicate it, making it more difficult to maintain
reliably" but they do exists, and serve a purpose.

I don't understand why people have started to create the theories about
our development process? Had I requested a tool to magically synchronize
DEV and PROD? No, I asked about a tool to _find_ the unexpected
differences between databases. If you never encountered a situation when
in the
dozens of environments the databases has diverged because somebody has
done something manually - good for you, you are lucky guy then.
I did.

Sergei

On Sun, 17 Apr 2016 14:10:50 -0600
Sergei Agalakov <Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I don't see how these questions are related to the proposed pg_dump
improvement.
I suggest to improve pg_dump so it can be used instead of the third
party tools like DBSteward and SQLWorkbench/J etc.
to compare two different databases or existing dumps, and to identify
the differences. The use cases will be exactly
the same as for the third party tools. The positive difference will be
that pg_dump is a very reliable, always available and supports all the
latest PostgreSQL features.
Do you imply that there shouldn't be any reasons to compare different
databases to find the differences between them?

Nobody has weighed in on this, but I have a theory ...

I (personally) worry that adding features like you suggest to pg_dump
would interfere with its ability to perform complete dump of a large
database in a _rapid_ manner. Using pg_dump as a backup tool has an
inherent desire for the tool to be as fast and low-impact on the
operation of the database as possible.

Features that would force pg_dump to care about ordering that isn't
necessary to its core functionality of providing a reliable backup
are liable to slow it down. They might also overcomplicate it, making
it more difficult to maintain reliably.

When you consider that possibility, and the fact that pg_dump isn't
_supposed_ to be a tool to help you with schema maintenance, it's easy
to see why someone would look for different approach to the problem.

And I feel that's what all the answers have attempted to do: suggest
ways to get what you want without asking them to be implemented in a
tool that isn't really the right place for them anyway. While your
arguments toward making this change are valid, I'm not sure that
they are compelling enough to justify adding a feature where it
doesn't really belong.

Another side to this, is that your request suggests that your
development process is suboptimal. Of course, I can't be 100% sure
since you haven't explained your process ... but my experience is
that people who feel the need to automagically sync prod and dev
databases have a suboptimal development process. Thus, the suggestions
are also biased toward helping you improve your process instead of
adjusting a tool to better support a suboptimal process.

Of course, if the people actually doing the work on the code disagree
with me, then they'll make the change. I'm just expressing an opinion.

Sergei

On Apr 17, 2016, at 12:41 PM, Sergei Agalakov

<Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I know about DBSteward. I don't like to bring PHP infrastructure

only to be able to compare two dumps,

and to deal with potential bugs in the third party tools. The

pg_dump in other hand is always here, and is always trusted.

SQLWorkbench/J also can compare two schemas, and requires only

Java. Again, I trust pg_dump more.

http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files

with the determined order of the statements,

but it CAN do it with the minor changes, and be more useful to

administrators. Why rely on the third party tools

for the tasks that can be done with the native, trusted tools?

Sergei

Does it matter if they differ if you cannot recreate the correct

one exactly from source-controllled DDL? Or know how they are
supposed to differ if this is a migration point?

--
Sent via pgsql-general mailing list

(pgsql-general(at)postgresql(dot)org)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Bill Moran

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13

Melvin Davidson

melvin6925@gmail.com

almost 10 years ago

In reply to: Sergei Agalakov (#12)

Re: Enhancement request for pg_dump

fyi, if you have a feature request or enhancement, then the proper place
for that is here -> https://postgresql.uservoice.com/forums/21853-general

On Sun, Apr 17, 2016 at 8:26 PM, Sergei Agalakov <
Sergei.Agalakov@getmyle.com> wrote:

I hardly can see that a sorting of the grants by users will create a
measurable impact on the pg_dump performance in a real database.
One can imaging a database with tens of thousands of objects and tens of
thousands of users and almost no data, but it would be quite unusual.
Anyway, if a sorting behavior is initiated by a command line parameter,
and isn't a default behavior of pg_dump then this argument doesn't work.
After all pg_dump isn't the tool to do _just_ reliable backup. It can be
used for migration, it can be used for schema cloning, to initiate a
standby...
There are many flags for pg_dump that are absolutely unnecessary for full
database backup. So they do
"... might also overcomplicate it, making it more difficult to maintain
reliably" but they do exists, and serve a purpose.

I don't understand why people have started to create the theories about
our development process? Had I requested a tool to magically synchronize
DEV and PROD? No, I asked about a tool to _find_ the unexpected
differences between databases. If you never encountered a situation when in
the
dozens of environments the databases has diverged because somebody has
done something manually - good for you, you are lucky guy then.
I did.

Sergei

On Sun, 17 Apr 2016 14:10:50 -0600
Sergei Agalakov <Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I don't see how these questions are related to the proposed pg_dump
improvement.
I suggest to improve pg_dump so it can be used instead of the third
party tools like DBSteward and SQLWorkbench/J etc.
to compare two different databases or existing dumps, and to identify
the differences. The use cases will be exactly
the same as for the third party tools. The positive difference will be
that pg_dump is a very reliable, always available and supports all the
latest PostgreSQL features.
Do you imply that there shouldn't be any reasons to compare different
databases to find the differences between them?

Nobody has weighed in on this, but I have a theory ...

I (personally) worry that adding features like you suggest to pg_dump
would interfere with its ability to perform complete dump of a large
database in a _rapid_ manner. Using pg_dump as a backup tool has an
inherent desire for the tool to be as fast and low-impact on the
operation of the database as possible.

Features that would force pg_dump to care about ordering that isn't
necessary to its core functionality of providing a reliable backup
are liable to slow it down. They might also overcomplicate it, making
it more difficult to maintain reliably.

When you consider that possibility, and the fact that pg_dump isn't
_supposed_ to be a tool to help you with schema maintenance, it's easy
to see why someone would look for different approach to the problem.

And I feel that's what all the answers have attempted to do: suggest
ways to get what you want without asking them to be implemented in a
tool that isn't really the right place for them anyway. While your
arguments toward making this change are valid, I'm not sure that
they are compelling enough to justify adding a feature where it
doesn't really belong.

Another side to this, is that your request suggests that your
development process is suboptimal. Of course, I can't be 100% sure
since you haven't explained your process ... but my experience is
that people who feel the need to automagically sync prod and dev
databases have a suboptimal development process. Thus, the suggestions
are also biased toward helping you improve your process instead of
adjusting a tool to better support a suboptimal process.

Of course, if the people actually doing the work on the code disagree
with me, then they'll make the change. I'm just expressing an opinion.

Sergei

On Apr 17, 2016, at 12:41 PM, Sergei Agalakov

<Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I know about DBSteward. I don't like to bring PHP infrastructure

only to be able to compare two dumps,

and to deal with potential bugs in the third party tools. The

pg_dump in other hand is always here, and is always trusted.

SQLWorkbench/J also can compare two schemas, and requires only

Java. Again, I trust pg_dump more.

http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files with

the determined order of the statements,

but it CAN do it with the minor changes, and be more useful to

administrators. Why rely on the third party tools

for the tasks that can be done with the native, trusted tools?

Sergei

Does it matter if they differ if you cannot recreate the correct one

exactly from source-controllled DDL? Or know how they are supposed to
differ if this is a migration point?

--
Sent via pgsql-general mailing list

(pgsql-general(at)postgresql(dot)org)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Bill Moran

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#14

Sergei Agalakov

Sergei.Agalakov@getmyle.com

almost 10 years ago

In reply to: Adrian Klaver (#9)

Re: Enhancement request for pg_dump

Nobody asks for pg_dump to be a schema comparison tool. As you tell
yourself
it is a most reliable schema capturing tool. All I am asking is that if
pg_dump is executed
on two databases with the identical schemas and security it should be
able to produce
the identical SQL dumps of these schemas and security. As you have
mentioned in other e-mail
pg_dump actually rewrites some statements for consistency. It just
doesn't do it consistently everywhere.

I can't say anything about priorities of development for pg_dump. The
proposed change seems to be
a low hanging fruit, it isn't difficult to add ORDER BY in the
appropriate places. The other question is if
this is a useful enhancement. The existence of the third party tools
doesn't seem to be very relevant here.
Should be stopped the development of pgAdmin or psql because exist the
third party tools with the similar functionality?
:-)

Sergei

On 04/17/2016 01:10 PM, Sergei Agalakov wrote:

I don't see how these questions are related to the proposed pg_dump
improvement.
I suggest to improve pg_dump so it can be used instead of the third
party tools like DBSteward and SQLWorkbench/J etc.
to compare two different databases or existing dumps, and to identify
the differences. The use cases will be exactly
the same as for the third party tools. The positive difference will be
that pg_dump is a very reliable, always available and supports all the
latest PostgreSQL features.
Do you imply that there shouldn't be any reasons to compare different
databases to find the differences between them?

I don't think that is what is being said, more the right tool for the
right job. pg_dump --> pg_restore/psql are for capturing some or all of
the information(including possibly data) in a database at a point in
time so that information can be recreated at another point in time.
While by necessity that includes capturing schema data (or not in the
case of -a) it is not really a schema comparison tool. While that is in
the realm of doable it means developer time to replicate something that
other tools do. Given the ever lengthening list of requested features in
general, this request might have difficulty reaching a sufficient level
of priority, esp. in light of the presence of existing tools.

Sergei

On Apr 17, 2016, at 12:41 PM, Sergei Agalakov

<Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

I know about DBSteward. I don't like to bring PHP infrastructure

only to be able to compare two dumps,

and to deal with potential bugs in the third party tools. The

pg_dump in other hand is always here, and is always trusted.

SQLWorkbench/J also can compare two schemas, and requires only Java.

Again, I trust pg_dump more.

http://www.sql-workbench.net/

May be pg_dump was never INTENDED to generate the dump files with

the determined order of the statements,

but it CAN do it with the minor changes, and be more useful to

administrators. Why rely on the third party tools

for the tasks that can be done with the native, trusted tools?

Sergei

Does it matter if they differ if you cannot recreate the correct one
exactly from source-controllled DDL? Or know how they are supposed to
differ if this is a migration point?

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15

Sergei Agalakov

Sergei.Agalakov@getmyle.com

almost 10 years ago

In reply to: Melvin Davidson (#13)

Re: Enhancement request for pg_dump

Thank you, I know this place.
I just wanted to check that my request will have the peoples support.
So far it doesn't. It looks like that or people never need to compare
two PG databases to find the differences in the schemas or security,
or happy to use the third party tools to do it, and don't want any
native support. If I see any support from other people for this idea
then I shall
go to https://postgresql.uservoice.com/forums/21853-general, but looking
on, say, "Partitions in Oracle style" that are marked as have been
started in 2010
(sure, INHERITANCE is so much Oracle style partitions!) I don't see it
to be very useful.

Sergei

fyi, if you have a feature request or enhancement, then the proper
place for that is here ->
https://postgresql.uservoice.com/forums/21853-general

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16

David Rowley

dgrowleyml@gmail.com

almost 10 years ago

In reply to: Sergei Agalakov (#15)

Re: Enhancement request for pg_dump

On 18 April 2016 at 13:10, Sergei Agalakov <Sergei.Agalakov@getmyle.com> wrote:

Thank you, I know this place.
I just wanted to check that my request will have the peoples support.
So far it doesn't. It looks like that or people never need to compare two PG
databases to find the differences in the schemas or security,
or happy to use the third party tools to do it, and don't want any native
support. If I see any support from other people for this idea then I shall
go to https://postgresql.uservoice.com/forums/21853-general, but looking on,
say, "Partitions in Oracle style" that are marked as have been started in
2010
(sure, INHERITANCE is so much Oracle style partitions!) I don't see it to be
very useful.

I can't particularly vouch for that site, as I've personally never
seen it before, but I'd like to say that you'll probably get along
better if you appeared to have a more optimistic view. If you bothered
to consider the "parallel query option" item listed on that site, and
compared that to the current status of 9.6, you might feel
differently. EDB and others have put lots of work in to parallel query
for 9.6. If your intentions here are to gather support for your cause
then I highly recommend not appearing negative. Keep in mind that
you've not paid some company for a license for PostgreSQL and the
people reading your emails here are most likely not at your beckon
call, and are not here to fulfill all your PostgreSQL wishes.

To me your proposal does seem quite half thought through. Do you
really suppose we just sort the GRANT output and call it done. pg_dump
now has stable output? I think that would barely scratch the surface.
What about COPY output, we'd have to sort that too, and that could be
rather expensive. Now, you could say that we'd just limit this to
schema-only related stuff, and that might be ok, but you'll need to
ensure that everything is addressed and that your now matching output
didn't just occur because all of the planets happened to line up on
the day you ran pg_dump. You might propose that we could get around
the performance hit of generating a stable output by having an
optional flag to enable this. That would appear to sound ok at my
first thought. If C is your thing then you could open up pg_dump.c
and have a look around, if not then remaining positive and
constructive, and doing your best not to upset people who's C *is*
their thing is probably a good tactical move here.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17

Adrian Klaver

adrian.klaver@aklaver.com

almost 10 years ago

In reply to: Sergei Agalakov (#15)

Re: Enhancement request for pg_dump

On 04/17/2016 06:10 PM, Sergei Agalakov wrote:

Thank you, I know this place.
I just wanted to check that my request will have the peoples support.
So far it doesn't. It looks like that or people never need to compare
two PG databases to find the differences in the schemas or security,
or happy to use the third party tools to do it, and don't want any
native support. If I see any support from other people for this idea
then I shall
go to https://postgresql.uservoice.com/forums/21853-general, but looking
on, say, "Partitions in Oracle style" that are marked as have been
started in 2010
(sure, INHERITANCE is so much Oracle style partitions!) I don't see it
to be very useful.

Honestly I did not know that site existed. If you want some traction on
this I would suggest the traditional way.

If you think it is a bug:
http://www.postgresql.org/support/submitbug/

If you think it should be a new feature then make your case on --hackers:
http://www.postgresql.org/list/pgsql-hackers/

FYI, the place I look for requested features is the Todo list:

https://wiki.postgresql.org/wiki/Todo

Sergei

fyi, if you have a feature request or enhancement, then the proper
place for that is here ->
https://postgresql.uservoice.com/forums/21853-general

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18

Adrian Klaver

adrian.klaver@aklaver.com

almost 10 years ago

In reply to: Sergei Agalakov (#14)

Re: Enhancement request for pg_dump

On 04/17/2016 05:50 PM, Sergei Agalakov wrote:

Nobody asks for pg_dump to be a schema comparison tool. As you tell
yourself
it is a most reliable schema capturing tool. All I am asking is that if
pg_dump is executed
on two databases with the identical schemas and security it should be
able to produce
the identical SQL dumps of these schemas and security. As you have
mentioned in other e-mail
pg_dump actually rewrites some statements for consistency. It just
doesn't do it consistently everywhere.

And there in lies the rub. Making that happen, I suspect, is going to be
a lot of work. The goal of the tool is not to produce output that is
diff friendly but that produces working schema when transferred to
another database. I understand what you want and why I just think it is
not as easy as you want to believe. See my other post for ways to try to
make this happen.

I can't say anything about priorities of development for pg_dump. The
proposed change seems to be
a low hanging fruit, it isn't difficult to add ORDER BY in the
appropriate places. The other question is if
this is a useful enhancement. The existence of the third party tools
doesn't seem to be very relevant here.
Should be stopped the development of pgAdmin or psql because exist the
third party tools with the similar functionality?
:-)

FYI, pgAdmin is a third party tool, currently being completely rewritten:

http://pgsnake.blogspot.com/2016/04/pgadmin-4-elephant-nears-finish-line.html

Sergei

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19

Pierre Chevalier Géologue

pierrechevaliergeol@free.fr

almost 10 years ago

In reply to: Sergei Agalakov (#12)

Re: Enhancement request for pg_dump

Hi,
Le 18/04/2016 02:26, Sergei Agalakov a écrit :

If you never encountered a situation when in the dozens of
environments the databases has diverged because somebody has
done something manually - good for you, you are lucky guy then.

I'm definitely not a lucky guy at all! :-)
And this is happening to me *right now*... My case is a little bit more
complicated, but I'll come back to this later on, probably in another
thread, in order to avoid confusion.

Last time I had to do this kind of exercise, a few years ago, I was in a
remote place without Internet access, so I could not get any information
or ask any help. I was kind of surprised/frustrated by the (apparent)
lack of order of the pg_dump output. So I manually wrote scripts to
export the tables and views' definitions separately, one by one, (using
pg_dump, of course) and stack them in the order I wished into a large
file. That was quite suboptimal, but it worked as expected, and I was
able to diff and patch correctly.

And today, I thought: "time has passed, I'm sure that pg_dump must
magically have an option to get the output in some kind of order, by
now"... 'man pg_dump' didn't help. And as I can read this discussion (I
haven't finished yet, obviously), this is not the case.

À+
Pierre
--
____________________________________________________________________________
Pierre Chevalier
PChGEI: Pierre Chevalier Géologue Et Informaticien
Partenaire DALIBO
Mesté Duran
32100 Condom
Tél+fax : 09 75 27 45 62
06 37 80 33 64
Émail : pierrechevaliergeolCHEZfree.fr
icq# : 10432285
jabber: pierre.chevalier1967@jabber.fr
http://pierremariechevalier.free.fr/pierre_chevalier_geologue
____________________________________________________________________________

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#20

Pierre Chevalier Géologue

pierrechevaliergeol@free.fr

almost 10 years ago

In reply to: Sergei Agalakov (#15)

Re: Enhancement request for pg_dump

Le 18/04/2016 03:10, Sergei Agalakov a écrit :

I just wanted to check that my request will have the peoples support.
So far it doesn't.

Well, you can count on my support, for sure!

It looks like that or people never need to compare two PG databases
to find the differences in the schemas or security, or happy to use
the third party tools to do it, and don't want any native support.

I definitely share your opinion. But I understand that pg_dump wasn't
originally designed for this purpose, although such a feature would seem
so natural, at first glance.
So, another idea would be to implement another utility, something very
similar to pg_dump (probably sharing most of its code with it, or
calling pg_dump like my scripts did), but giving an ordered output.
Implementing such a tool outside of postgres would bring a bit more chaos.
So, the tool should be preferably bundled with postgres, to avoid the
need to get a third-party tool.
Hm. When I think about it twice, it sounds like overkilling.

Opinions?

If I see any support from other people for this idea then I shall
go to https://postgresql.uservoice.com/forums/21853-general,

I'd say +1, but the idea should be a bit more matured first, maybe?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#21

Adrian Klaver

adrian.klaver@aklaver.com

almost 10 years ago

In reply to: Pierre Chevalier Géologue (#19)

#22

Pierre Chevalier Géologue

pierrechevaliergeol@free.fr

almost 10 years ago

In reply to: Adrian Klaver (#21)

#23

Adrian Klaver

adrian.klaver@aklaver.com

almost 10 years ago

In reply to: Pierre Chevalier Géologue (#22)