Scriptable way to validate a pg_dump restore ?

Started by Laura Smithabout 2 years ago13 messagesgeneral

n5d9xq3ti233xiyif2vp@protonmail.ch

about 2 years ago

Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ? e.g. using doing something clever with ctid or something to ensure both the schema and all its rows were restored to the same point at which the dump was taken ?

Thanks !

Laura

shaheedhaque@gmail.com

about 2 years ago

In reply to: Laura Smith (#1)

Re: Scriptable way to validate a pg_dump restore ?

I'd also like to know how to do this. The current approaches seem, afaict,
to involve making on both end of the connection. Even given the inherently
racy nature of the issue, that seems unwieldy to me.

/messages/by-id/CAHAc2jdAHvp7tFZBP37awcth=T3h5WXCN9KjZOvuTNJaAAC_hg@mail.gmail.com

On Mon, 29 Jan 2024, 14:12 Laura Smith, <n5d9xq3ti233xiyif2vp@protonmail.ch>
wrote:

Show quoted text

Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather
than online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ? e.g. using doing
something clever with ctid or something to ensure both the schema and all
its rows were restored to the same point at which the dump was taken ?

Thanks !

Laura

ronljohnsonjr@gmail.com

about 2 years ago

In reply to: Laura Smith (#1)

Re: Scriptable way to validate a pg_dump restore ?

On Mon, Jan 29, 2024 at 3:12 AM Laura Smith <
n5d9xq3ti233xiyif2vp@protonmail.ch> wrote:

Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather
than online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ? e.g. using doing
something clever with ctid or something to ensure both the schema and all
its rows were restored to the same point at which the dump was taken ?

1. Since pg_dump creates a logical copy, nothing binary like ctid will work.
2. pg_dump is designed to take a snapshot.If it doesn't, it would fail
for people who do logical replication. However, no one has noticed.

That's kinda like being asked to prove that rocks *always* fall when you
drop them. Either you trust physics, because physics has always worked, or
you must watch every rock, because next time it might not fall. The
analogy is slightly flawed, since we always check the pg_dump and
pg_restore return codes, since something else might impact their function.

But if you still need evidence, here's what I'm doing to verify table and
record counts during a 9.6 -> 14 migration. You'll have to modify it for
your purpose.

Create this table and function beforehand:
CREATE TABLE dba.migration_table_counts (
location text check (location in ('96', '14'))
, table_name text
, row_count bigint
, count_time timestamp without time zone default current_timestamp
, primary key (table_name, location)
);
CREATE OR REPLACE FUNCTION dba.get_table_counts(_p_source TEXT) RETURNS
INTEGER
LANGUAGE plpgsql
AS $func$
DECLARE
r RECORD;
_sql TEXT;
_table_count BIGINT;
BEGIN
FOR r IN select relnamespace::regnamespace::text||'.'||relname as
table_name
from pg_class cla
where relkind = 'r'
and not exists (select 1 -- excludes parent tables
from pg_inherits inh1
where inh1.inhparent = cla.oid)
and relnamespace::regnamespace::text
not in ('pg_catalog', 'information_schema', 'dba')
order by 1
LOOP
_sql := FORMAT('SELECT COUNT(*) FROM %s;', r.table_name);
RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:
SS.MS'), _sql;
EXECUTE _sql INTO _table_count;
--RAISE NOTICE '%', _table_count;
INSERT INTO dba.migration_table_counts (location, table_name,
row_count)
VALUES (_p_source, r.table_name, _table_count);
END LOOP;
RAISE NOTICE '% %', to_char(clock_timestamp(), 'YYYY-MM-DD HH24:MI:SS.MS'),
'Finished';
RETURN 0;
END
$func$;

Run this script in a cron job that executes at the same time as your
pg_dump cron job. Parameters should be for the source database.
#!/bin/bash
declare -gr Server=$1
declare -gr DB=$2
declare -gr Source=$3
if [ -z $Server ]; then echo "Requires a server name."; exit 2; fi
if [ -z $DB ]; then echo "Requires a DB name."; exit 3; fi
if [ -z $Source ]; then echo "Requires a source: 96|14."; exit 4; fi
psql -U postgres -h ${Server} $DB -Xac "DELETE FROM
dba.migration_table_counts WHERE location = '$Source';"
psql -U postgres -h ${Server} $DB -Xac "select * from
dba.get_table_counts('$Source');"

Run the same script on the destination server after the pg_restore is
finished.
Dump the source dba.migration_table_counts then load it into the
destination dba.migration_table_counts.

These two queries run on the destination server will check that all tables
exist in both databases, and that the record counts are the same.

You'll have some slight variations, since the two jobs are in separate
transactions. (Mine won't, since the applications will be shut down, and
pg_hba.conf will block them.)

declare -gr SQL1="
with
c96 as (select table_name, row_count from dba.migration_table_counts
where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts
where location = '14')
select c96.*, c14.*
from c96 full join c14 on c96.table_name = c14.table_name
where c96.table_name is null
or c14.table_name is null
order by c96.table_name, c14.table_name;"
psql $DB -ac "$SQL1"

declare -gr SQL2="
with
c96 as (select table_name, row_count from dba.migration_table_counts
where location = '96'),
c14 as (select table_name, row_count from dba.migration_table_counts
where location = '14')
select c96.*, c14.*, c96.row_count - c14.row_count as row_diff
from c96 inner join c14 on c96.table_name = c14.table_name
where c96.row_count != c14.row_count
order by c96.table_name;"
psql $DB -ac "$SQL2"

adrian.klaver@aklaver.com

about 2 years ago

In reply to: Laura Smith (#1)

Re: Scriptable way to validate a pg_dump restore ?

On 1/29/24 00:12, Laura Smith wrote:

Hi

Let's say I've got a scenario where I'm doing a pg_dump replication rather than online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ? e.g. using doing something clever with ctid or something to ensure both the schema and all its rows were restored to the same point at which the dump was taken ?

Assuming you are using pg_restore on a non-plain text dump file then
from pg_restore.c:

* pg_restore.c

* pg_restore is an utility extracting postgres database definitions
* from a backup archive created by pg_dump using the archiver
* interface.
*
* pg_restore will read the backup archive and
* dump out a script that reproduces
* the schema of the database in terms of
* user-defined types
* user-defined functions
* tables
* indexes
* aggregates
* operators
* ACL - grant/revoke
*
* the output script is SQL that is understood by PostgreSQL
*
* Basic process in a restore operation is:
*
* Open the Archive and read the TOC.
* Set flags in TOC entries, and *maybe* reorder them.
* Generate script to stdout
* Exit

Then:

pg_restore -l -f <output_file> <dump_file>

to get the TOC mentioned above. Walk through that to verify schema is
the same in the restored database.

This will not tell you whether all the data was transferred. You will
either have to trust from pg_dump.c:

* pg_dump will read the system catalogs in a database and dump out a
* script that reproduces the schema in terms of SQL that is
understood
* by PostgreSQL
*
* Note that pg_dump runs in a transaction-snapshot mode transaction,
* so it sees a consistent snapshot of the database including system
* catalogs. However, it relies in part on various specialized backend
* functions like pg_get_indexdef(), and those things tend to look at
* the currently committed state. So it is possible to get 'cache
* lookup failed' error if someone performs DDL changes while a
dump is
* happening. The window for this sort of thing is from the
acquisition
* of the transaction snapshot to getSchemaData() (when pg_dump
acquires
* AccessShareLock on every table it intends to dump). It isn't
very large,
* but it can happen.

Or come up with way to capture the state of the data at the time of dump
and then compare to restored database. Something like Ron posted.

Thanks !

Laura

--
Adrian Klaver
adrian.klaver@aklaver.com

shaheedhaque@gmail.com

about 2 years ago

In reply to: Adrian Klaver (#4)

Re: Scriptable way to validate a pg_dump restore ?

On Mon, 29 Jan 2024, 22:52 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:

On 1/29/24 00:12, Laura Smith wrote:

Hi

Let's say I've got a scenario where I'm doing a pg_dump replication

rather than online streaming, e.g. due to air-gap or whatever.

Is there a scriptable way to validate the restore ? e.g. using doing

something clever with ctid or something to ensure both the schema and all
its rows were restored to the same point at which the dump was taken ?

Assuming you are using pg_restore on a non-plain text dump file then
from pg_restore.c:

* pg_restore.c

* pg_restore is an utility extracting postgres database definitions
* from a backup archive created by pg_dump using the archiver
* interface.
*
* pg_restore will read the backup archive and
* dump out a script that reproduces
* the schema of the database in terms of
* user-defined types
* user-defined functions
* tables
* indexes
* aggregates
* operators
* ACL - grant/revoke
*
* the output script is SQL that is understood by PostgreSQL
*
* Basic process in a restore operation is:
*
* Open the Archive and read the TOC.
* Set flags in TOC entries, and *maybe* reorder them.
* Generate script to stdout
* Exit

Then:

pg_restore -l -f <output_file> <dump_file>

to get the TOC mentioned above. Walk through that to verify schema is
the same in the restored database.

This will not tell you whether all the data was transferred. You will
either have to trust from pg_dump.c:

* pg_dump will read the system catalogs in a database and dump out a
* script that reproduces the schema in terms of SQL that is
understood
* by PostgreSQL
*
* Note that pg_dump runs in a transaction-snapshot mode transaction,
* so it sees a consistent snapshot of the database including system
* catalogs. However, it relies in part on various specialized
backend
* functions like pg_get_indexdef(), and those things tend to look at
* the currently committed state. So it is possible to get 'cache
* lookup failed' error if someone performs DDL changes while a
dump is
* happening. The window for this sort of thing is from the
acquisition
* of the transaction snapshot to getSchemaData() (when pg_dump
acquires
* AccessShareLock on every table it intends to dump). It isn't
very large,
* but it can happen.

Or come up with way to capture the state of the data at the time of dump
and then compare to restored database. Something like Ron posted.

Right, for me, state, not just record count is what I'm interested in (for
the initial full table copy part of replication). So, given the explanation
about the possible per-table window, is there some property of the table
that could be used to confirm that a table has made it across?

I guess there is such a thing since the following incremental syncing would
presumably need it. I had hoped the LSN was this thing, but confirmation
would be great.

Thanks, Shaheed

Show quoted text

Thanks !

Laura

--
Adrian Klaver
adrian.klaver@aklaver.com

adrian.klaver@aklaver.com

about 2 years ago

In reply to: Shaheed Haque (#5)

Re: Scriptable way to validate a pg_dump restore ?

On 1/29/24 09:28, Shaheed Haque wrote:

Right, for me, state, not just record count is what I'm interested in
(for the initial full table copy part of replication). So, given the
explanation about the possible per-table window, is there some property
of the table that could be used to confirm that a table has made it across?

I guess there is such a thing since the following incremental syncing
would presumably need it. I had hoped the LSN was this thing, but
confirmation would be great.

The OP was referring to the pg_dump/pg_restore cycle, you seem to be
referring to logical replication. Is that correct?

Thanks, Shaheed

Thanks !

Laura

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

shaheedhaque@gmail.com

about 2 years ago

In reply to: Adrian Klaver (#6)

Re: Scriptable way to validate a pg_dump restore ?

On Mon, 29 Jan 2024, 23:57 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:

On 1/29/24 09:28, Shaheed Haque wrote:

Right, for me, state, not just record count is what I'm interested in
(for the initial full table copy part of replication). So, given the
explanation about the possible per-table window, is there some property
of the table that could be used to confirm that a table has made it

across?

I guess there is such a thing since the following incremental syncing
would presumably need it. I had hoped the LSN was this thing, but
confirmation would be great.

The OP was referring to the pg_dump/pg_restore cycle, you seem to be
referring to logical replication. Is that correct?

Yes. But I was under the impression that the initial copy of logical
replication was the same?

Show quoted text

Thanks, Shaheed

Thanks !

Laura

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

adrian.klaver@aklaver.com

about 2 years ago

In reply to: Shaheed Haque (#7)

Re: Scriptable way to validate a pg_dump restore ?

On 1/29/24 10:12, Shaheed Haque wrote:

Yes. But I was under the impression that the initial copy of logical
replication was the same?

Are you taking about the copy_data option to WITH?

If so yes and no.

Yes as it uses COPY to transfer the data.

No as what COPY transfers can be affected by WHERE clauses on the
publisher. Also if you have cascading publishers/subscriptions the
'original' data maybe upstream of the publisher you are comparing to.
Finally logical replication is generally not static so there is the
issue of determining a point in time for the check.

--
Adrian Klaver
adrian.klaver@aklaver.com

n5d9xq3ti233xiyif2vp@protonmail.ch

about 2 years ago

In reply to: Ron (#3)

Re: Scriptable way to validate a pg_dump restore ?

On Monday, 29 January 2024 at 09:06, Ron Johnson <ronljohnsonjr@gmail.com> wrote:

That's kinda like being asked to prove that rocks always fall when you drop them. Either you trust physics, because physics has always worked, or you must watch every rock, because next time it might not fall. The analogy is slightly flawed, since we always check the pg_dump and pg_restore return codes, since something else might impact their function.

But if you still need evidence, here's what I'm doing to verify table and record counts during a 9.6 -> 14 migration. You'll have to modify it for your purpose.

Thanks Ron !

I must admit that I am willing to trust pg_dump / pg_restore, mostly for the reasons Adrian Klaver implied.

However your script is likely the very thing I was looking for in terms of belt & braces. So I appreciate you publishing it as a source of inspiration !

shaheedhaque@gmail.com

about 2 years ago

In reply to: Adrian Klaver (#8)

Re: Scriptable way to validate a pg_dump restore ?

On Tue, 30 Jan 2024, 00:27 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:

On 1/29/24 10:12, Shaheed Haque wrote:

Yes. But I was under the impression that the initial copy of logical
replication was the same?

Are you taking about the copy_data option to WITH?

If so yes and no.

Yes as it uses COPY to transfer the data.

Yes, this is what I meant.

No as what COPY transfers can be affected by WHERE clauses on the

publisher. Also if you have cascading publishers/subscriptions the
'original' data maybe upstream of the publisher you are comparing to.

Good points, understood. For the next bit, let's assume neither of these
are in play.

Finally logical replication is generally not static so there is the

issue of determining a point in time for the check.

Indeed. I currently have a static source db but would eventually like to
eliminate the implied downtime. What I'd like to provide my user is some
indication of progress initially during the copy_data phase, and for the
future, of the anticipated incremental convergence.

And, as per my other note, I would ideally like to be able to do this using
only a connection to one db.

I was assuming that logical replication needed "something" similar
internally, and was hoping the LSNs were that "something".

Thanks, Shaheed

Show quoted text

--
Adrian Klaver
adrian.klaver@aklaver.com

adrian.klaver@aklaver.com

about 2 years ago

In reply to: Shaheed Haque (#10)

Re: Scriptable way to validate a pg_dump restore ?

On 1/29/24 11:35, Shaheed Haque wrote:

On Tue, 30 Jan 2024, 00:27 Adrian Klaver, <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 1/29/24 10:12, Shaheed Haque wrote:

Yes. But I was under the impression that the initial copy of logical
replication was the same?

Are you taking about the copy_data option to WITH?

If so yes and no.

Yes as it uses COPY to transfer the data.

Yes, this is what I meant.

No as what COPY transfers can be affected by WHERE clauses on the
publisher. Also if you have cascading publishers/subscriptions the
'original' data maybe upstream of the publisher you are comparing to.

Good points, understood. For the next bit, let's assume neither of these
are in play.

Finally logical replication is generally not static so there is the
issue of determining a point in time for the check.

Indeed. I currently have a static source db but would eventually like to
eliminate the implied downtime. What I'd like to provide my user is some

Implied downtime of what?

indication of progress initially during the copy_data phase, and for the
future, of the anticipated incremental convergence.

And, as per my other note, I would ideally like to be able to do this
using only a connection to one db.

I was assuming that logical replication needed "something" similar
internally, and was hoping the LSNs were that "something".

I'm going to say up front I am no expert on the internals of logical
replication. Will point you at:

https://www.postgresql.org/docs/current/protocol-message-formats.html

A quick look at that indicates to me it is more involved then you think.

Thanks, Shaheed

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

shaheedhaque@gmail.com

about 2 years ago

In reply to: Adrian Klaver (#11)

Re: Scriptable way to validate a pg_dump restore ?

On Tue, 30 Jan 2024, 05:02 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:

On 1/29/24 11:35, Shaheed Haque wrote:

On Tue, 30 Jan 2024, 00:27 Adrian Klaver, <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 1/29/24 10:12, Shaheed Haque wrote:

Yes. But I was under the impression that the initial copy of

logical

replication was the same?

Are you taking about the copy_data option to WITH?

If so yes and no.

Yes as it uses COPY to transfer the data.

Yes, this is what I meant.

No as what COPY transfers can be affected by WHERE clauses on the
publisher. Also if you have cascading publishers/subscriptions the
'original' data maybe upstream of the publisher you are comparing to.

Good points, understood. For the next bit, let's assume neither of these
are in play.

Finally logical replication is generally not static so there is the
issue of determining a point in time for the check.

Indeed. I currently have a static source db but would eventually like to
eliminate the implied downtime. What I'd like to provide my user is some

Implied downtime of what?

indication of progress initially during the copy_data phase, and for the
future, of the anticipated incremental convergence.

And, as per my other note, I would ideally like to be able to do this
using only a connection to one db.

I was assuming that logical replication needed "something" similar
internally, and was hoping the LSNs were that "something".

I'm going to say up front I am no expert on the internals of logical
replication. Will point you at:

https://www.postgresql.org/docs/current/protocol-message-formats.html

A quick look at that indicates to me it is more involved then you think.

I'll take a look. Thanks for the tip and the gentle guidance; it is much
appreciated.

Show quoted text

Thanks, Shaheed

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com

shaheedhaque@gmail.com

about 2 years ago

In reply to: Adrian Klaver (#11)

Re: Scriptable way to validate a pg_dump restore ?

On Tue, 30 Jan 2024, 05:02 Adrian Klaver, <adrian.klaver@aklaver.com> wrote:

On 1/29/24 11:35, Shaheed Haque wrote:

On Tue, 30 Jan 2024, 00:27 Adrian Klaver, <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

On 1/29/24 10:12, Shaheed Haque wrote:

Yes. But I was under the impression that the initial copy of

logical

replication was the same?

Are you taking about the copy_data option to WITH?

If so yes and no.

Yes as it uses COPY to transfer the data.

Yes, this is what I meant.

No as what COPY transfers can be affected by WHERE clauses on the
publisher. Also if you have cascading publishers/subscriptions the
'original' data maybe upstream of the publisher you are comparing to.

Good points, understood. For the next bit, let's assume neither of these
are in play.

Finally logical replication is generally not static so there is the
issue of determining a point in time for the check.

Indeed. I currently have a static source db but would eventually like to
eliminate the implied downtime. What I'd like to provide my user is some

Implied downtime of what?

Oh, forgot to say: the downtime of my Django based app. Not anything at the
PG level.

Show quoted text

indication of progress initially during the copy_data phase, and for the
future, of the anticipated incremental convergence.

And, as per my other note, I would ideally like to be able to do this
using only a connection to one db.

I was assuming that logical replication needed "something" similar
internally, and was hoping the LSNs were that "something".

I'm going to say up front I am no expert on the internals of logical
replication. Will point you at:

https://www.postgresql.org/docs/current/protocol-message-formats.html

A quick look at that indicates to me it is more involved then you think.

Thanks, Shaheed

--
Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

--
Adrian Klaver
adrian.klaver@aklaver.com