Comparing two PostgreSQL databases -- order of pg_dump output

Started by Joe Abbateover 14 years ago15 messages

jma@freedomcircle.com

over 14 years ago

Hi,

In order to compare the schema of two presumably identical databases,
I've been diffing the output of pg_dump -Osx. However, I've found that
the order of the output is not very reliable. For example, after
recreating the Pagila sample database, I find the following:

--- pagila.dmp	2011-08-26 14:34:48.000000000 -0400
+++ pagila.dev-dmp	2011-08-26 14:34:47.000000000 -0400
@@ -1140,7 +1140,7 @@
  --

CREATE TRIGGER last_updated
- BEFORE UPDATE ON city
+ BEFORE UPDATE ON actor
FOR EACH ROW
EXECUTE PROCEDURE last_updated();

@@ -1160,7 +1160,7 @@
  --

CREATE TRIGGER last_updated
- BEFORE UPDATE ON customer
+ BEFORE UPDATE ON category
FOR EACH ROW
EXECUTE PROCEDURE last_updated();
...

The same triggers exist on both databases, it's just that the order is
different (apparently they're output in creation order). This even more
crucial with PostGIS databases, which have several hundred function and
operator pairs where the only difference is one takes arguments of type
geometry and the other uses type geography. There the pg_dump diff
approach is nearly useless.

I thought that comparing database schemas would be quite desirable,
e.g., between development/test and production databases. Is there
perhaps some mechanism or tool that people use for this purpose, or is
this not a requirement?

Incidentally, these comparisons are for the Pyrseas tools I'm
developing. The output of dbtoyaml is predictable (not because of
anything I wrote, but because pyyaml outputs everything in alphabetical
order), and I can compare the YAML outputs quite nicely (however, it
doesn't show me things I haven't implemented yet, e.g., OPERATOR CLASSes
in the case of PostGIS).

Joe

Jaime Casanova

jaime@2ndquadrant.com

over 14 years ago

In reply to: Joe Abbate (#1)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

On Tue, Aug 30, 2011 at 2:07 PM, Joe Abbate <jma@freedomcircle.com> wrote:

Hi,

In order to compare the schema of two presumably identical databases, I've
been diffing the output of pg_dump -Osx. However, I've found that the order
of the output is not very reliable.

what about using pg_dump -Fc -Osx and use pg_restore -l to list
objects. then you can sort and compare objects and then a script that
compare schema of objects extracting them with -P, -T or -t

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

Joe Abbate

jma@freedomcircle.com

over 14 years ago

In reply to: Jaime Casanova (#2)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

Hola Jaime,

On 08/30/2011 03:24 PM, Jaime Casanova wrote:

what about using pg_dump -Fc -Osx and use pg_restore -l to list
objects. then you can sort and compare objects and then a script that
compare schema of objects extracting them with -P, -T or -t

That appears to be of limited use (i.e., it would only work for
functions, triggers and tables). pg_restore -L/--use_list is more
comprehensive. So the script would have to do something like the following:

$ pg_dump -Fc -Osx postgis > postgis.dump
$ pg_restore -l postgis.dump | sort -k4 > postgis.list
$ pg_restore -L postgis.list postgis.dump > postgis.sorted

Rinse and repeat on the second database and then diff the .sorted files.
Tried it and although it doesn't completely do the trick it's much
better than diffing the plain text pg_dump outputs (3000+ diff lines vs.
less than 200 and about half of that are actual differences).

Thanks,

Joe

Jaime Casanova

jaime@2ndquadrant.com

over 14 years ago

In reply to: Joe Abbate (#3)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

On Tue, Aug 30, 2011 at 3:14 PM, Joe Abbate <jma@freedomcircle.com> wrote:

Hola Jaime,

On 08/30/2011 03:24 PM, Jaime Casanova wrote:

what about using pg_dump -Fc -Osx and use pg_restore -l to list
objects. then you can sort and compare objects and then a script that
compare schema of objects extracting them with -P, -T or -t

That appears to be of limited use (i.e., it would only work for functions,
triggers and tables). pg_restore -L/--use_list is more comprehensive.
So the script would have to do something like the following:

$ pg_dump -Fc -Osx postgis > postgis.dump
$ pg_restore -l postgis.dump | sort -k4 > postgis.list

why not "sort -k4,5"?

--
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

Joe Abbate

jma@freedomcircle.com

over 14 years ago

In reply to: Jaime Casanova (#4)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

On 08/30/2011 05:33 PM, Jaime Casanova wrote:

On Tue, Aug 30, 2011 at 3:14 PM, Joe Abbate<jma@freedomcircle.com> wrote:

Hola Jaime,

On 08/30/2011 03:24 PM, Jaime Casanova wrote:

what about using pg_dump -Fc -Osx and use pg_restore -l to list
objects. then you can sort and compare objects and then a script that
compare schema of objects extracting them with -P, -T or -t

That appears to be of limited use (i.e., it would only work for functions,
triggers and tables). pg_restore -L/--use_list is more comprehensive.
So the script would have to do something like the following:

$ pg_dump -Fc -Osx postgis> postgis.dump
$ pg_restore -l postgis.dump | sort -k4> postgis.list

why not "sort -k4,5"?

sort -k4 sorts from the fourth field, the object type, to the end of
line. -k4,5 would sort on the type and schema name. I want to sort on
object name/attributes as well. BTW, I figured out why it doesn't fully
work. For functions, the arguments are listed, e.g.,

82; 1255 700618 FUNCTION public _st_covers(geography, geography) jma
459; 1255 700259 FUNCTION public _st_covers(geometry, geometry) jma

Unfortunately, for operators, the operand types are not included:

843; 2617 699799 OPERATOR public < jma
1861; 2617 700565 OPERATOR public < jma

so the pg_restore -L still keeps the original dump order (geometry
before geography).

Joe

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Joe Abbate (#1)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

Joe Abbate <jma@freedomcircle.com> writes:

In order to compare the schema of two presumably identical databases,
I've been diffing the output of pg_dump -Osx. However, I've found that
the order of the output is not very reliable.

Yeah, we've been around on that before. pg_dump does actually sort the
output items (modulo dependency requirements), but it sorts by the same
"tag" values that are printed by pg_restore -l, and those aren't currently
designed to be unique. It's not too clear if we could get away with
changing the definitions of the tag strings.

regards, tom lane

Joe Abbate

jma@freedomcircle.com

over 14 years ago

In reply to: Tom Lane (#6)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

On 08/30/2011 06:07 PM, Tom Lane wrote:

Yeah, we've been around on that before. pg_dump does actually sort the
output items (modulo dependency requirements), but it sorts by the same
"tag" values that are printed by pg_restore -l, and those aren't currently
designed to be unique. It's not too clear if we could get away with
changing the definitions of the tag strings.

The approach suggested by Jaime works fairly well. The only change I
would make is to add OPERATOR args to the pg_restore -l output, e.g.,

1843; 2617 699799 OPERATOR public <(geometry, geometry) jma
1861; 2617 700565 OPERATOR public <(geography, geography) jma

Joe

Stephen Frost

sfrost@snowman.net

over 14 years ago

In reply to: Joe Abbate (#1)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

* Joe Abbate (jma@freedomcircle.com) wrote:

In order to compare the schema of two presumably identical
databases, I've been diffing the output of pg_dump -Osx.

I'm not sure exactly how it does it, but check_postgres.pl offers this.

http://bucardo.org/wiki/Check_postgres

It also offers a whole slew of other useful things to monitor.

Thanks,

Stephen

Joe Abbate

jma@freedomcircle.com

over 14 years ago

In reply to: Stephen Frost (#8)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

Hi Stephen,

On 08/30/2011 07:11 PM, Stephen Frost wrote:

* Joe Abbate (jma@freedomcircle.com) wrote:

In order to compare the schema of two presumably identical
databases, I've been diffing the output of pg_dump -Osx.

I'm not sure exactly how it does it, but check_postgres.pl offers this.

http://bucardo.org/wiki/Check_postgres

It also offers a whole slew of other useful things to monitor.

Note that what I'm looking for is something to compare just about
EVERYTHING DDL under the PostgreSQL sun: tables, types, functions,
operators, etc. The description of same_schema appears to imply only a
subset of objects are compared (in fact, looking at the code, I can
confirm that limitation).

BTW, I tried installing check_postgres, but not being much into Perl and
not knowing what dependencies it has, "make test" failed 38/42 tests.

Joe

#10

Greg Sabino Mullane

greg@turnstep.com

over 14 years ago

In reply to: Joe Abbate (#9)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Note that what I'm looking for is something to compare just about
EVERYTHING DDL under the PostgreSQL sun: tables, types, functions,
operators, etc. The description of same_schema appears to imply only a
subset of objects are compared (in fact, looking at the code, I can
confirm that limitation).

You should try the latest version in git (which will soon be released
as 2.18.0). The same_schema check has been overhauled, and now can also
store a copy of a databases state to allow checking the same database
over time to see what has changed. It doesn't check *everything* yet,
but the only things missing are some of the more obscure items such
as custom conversions. It should be pretty easy to add in anything
that is not already covered, even for someone not versed in Perl.

BTW, I tried installing check_postgres, but not being much into Perl and
not knowing what dependencies it has, "make test" failed 38/42 tests.

That's not much to worry about. It's a pretty straightforward script,
in that it is very easy to determine if it is working for you or not,
even if some of the tests fail. :)

I'm not exactly sure how it does it

check_postgres queries the system catalogs, normalizes some things based
on the version, and creates a Perl object representation of the database.
It then compares that to the same thing from a different database/server,
or to a frozen version of an earlier scan.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201108302203
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAk5dl28ACgkQvJuQZxSWSsidhwCeMGEx8eVeaPlyRALuh8VuQ+rN
ynYAoLDGLOFNVbj3+NnRvZpLfgmh6Mgu
=w1eI
-----END PGP SIGNATURE-----

#11

Peter Eisentraut

peter_e@gmx.net

over 14 years ago

In reply to: Stephen Frost (#8)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

On tis, 2011-08-30 at 19:11 -0400, Stephen Frost wrote:

* Joe Abbate (jma@freedomcircle.com) wrote:

In order to compare the schema of two presumably identical
databases, I've been diffing the output of pg_dump -Osx.

I'm not sure exactly how it does it, but check_postgres.pl offers this.

http://bucardo.org/wiki/Check_postgres

That tool is also not without bugs in this regard.

Also, the interface it works with necessarily doesn't offer a good way
to examine the differences in detail; it only shows you that there are
differences.

#12

Peter Eisentraut

peter_e@gmx.net

over 14 years ago

In reply to: Tom Lane (#6)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

On tis, 2011-08-30 at 18:07 -0400, Tom Lane wrote:

Joe Abbate <jma@freedomcircle.com> writes:

In order to compare the schema of two presumably identical databases,
I've been diffing the output of pg_dump -Osx. However, I've found that
the order of the output is not very reliable.

Yeah, we've been around on that before. pg_dump does actually sort the
output items (modulo dependency requirements), but it sorts by the same
"tag" values that are printed by pg_restore -l, and those aren't currently
designed to be unique. It's not too clear if we could get away with
changing the definitions of the tag strings.

It's a bit strange that the tag for a trigger is "name" but the tag for
the trigger's comment is "name ON table". Not having the table name in
the trigger tag sounds wrong, because it makes the tag not very useful
for selecting the trigger from the TOC.

#13

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Peter Eisentraut (#12)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

Peter Eisentraut <peter_e@gmx.net> writes:

On tis, 2011-08-30 at 18:07 -0400, Tom Lane wrote:

Yeah, we've been around on that before. pg_dump does actually sort the
output items (modulo dependency requirements), but it sorts by the same
"tag" values that are printed by pg_restore -l, and those aren't currently
designed to be unique. It's not too clear if we could get away with
changing the definitions of the tag strings.

It's a bit strange that the tag for a trigger is "name" but the tag for
the trigger's comment is "name ON table". Not having the table name in
the trigger tag sounds wrong, because it makes the tag not very useful
for selecting the trigger from the TOC.

I don't think changing that would be a problem. What gets unpleasant is
trying to guarantee that pg_dump object tags are unconditionally unique.
That would, for example, mean that every argument type of every function
would have to be written out fully-schema-qualified.

Short of that sort of anal-retentiveness, there are going to be cases
where the dump order is a bit unpredictable. IMO what we need is a
reasonable compromise between verbosity and uniqueness, such that in
normal cases (ie, where you *didn't* intentionally create near-identical
functions in different schemas) you get a unique ordering. To get to
that, somebody's got to go through all the tag writing code and identify
where the trouble spots are. So far we've heard triggers and operators
nominated ... what else?

regards, tom lane

#14

Joe Abbate

jma@freedomcircle.com

over 14 years ago

In reply to: Tom Lane (#13)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

On 08/31/2011 10:17 AM, Tom Lane wrote:

Short of that sort of anal-retentiveness, there are going to be cases
where the dump order is a bit unpredictable. IMO what we need is a
reasonable compromise between verbosity and uniqueness, such that in
normal cases (ie, where you *didn't* intentionally create near-identical
functions in different schemas) you get a unique ordering. To get to
that, somebody's got to go through all the tag writing code and identify
where the trouble spots are. So far we've heard triggers and operators
nominated ... what else?

So far, for Pyrseas, I've tested aggregates, casts, constraint triggers,
conversions, domains, functions, indexes, languages, operators, rules,
schemas, sequences, tables (including check constraints, primary keys,
foreign keys, unique constraints and inherited tables), triggers, types
(base and composite), views and comments on the various objects. I'll
be testing operator classes and operator families in the coming weeks.
So far, triggers and operators are the only ones that have caused an
issue when using the technique suggested by Jaime (pg_dump -Fc followed
by pg_restore -l). Functions also caused problems in the plain text
pg_dump, e.g., because funcx(geography) sorts after funcx(geometry) if
the latter is created first.

Joe

#15

rsindlin

rsindlin@gmail.com

about 10 years ago

In reply to: Joe Abbate (#14)

Re: Comparing two PostgreSQL databases -- order of pg_dump output

Hi Joe,

I have run into what seems to be a similar issue with pg_dump --schema-only
in its trigger ordering. Did you ever find a satisfactory solution to this?
I posted my specific problem on DBA.StackExchange
<http://dba.stackexchange.com/questions/123691/output-from-pg-dump-schema-only-has-inconsistent-order>
, and based on some research I did, it seems like it could be an issue
related to the Collate setting of the DB. I was wondering if you had come
across anything supporting or refuting that.

Thanks,
-Randall

--
View this message in context: http://postgresql.nabble.com/Comparing-two-PostgreSQL-databases-order-of-pg-dump-output-tp4751332p5877720.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers