How to know if a database has changed

Started by marceloover 8 years ago12 messagesgeneral

marcelo.nicolet@gmail.com

over 8 years ago

The installation I'm planning will manage several databases, but not all
of them will change every day.
In order to planning/scripting the pg_dump usage, I would need to know
which databases had some change activity at the end of some day.
How can it be done?
TIA

Karsten Hilbert

Karsten.Hilbert@gmx.net

over 8 years ago

In reply to: marcelo (#1)

Re: How to know if a database has changed

On Mon, Dec 11, 2017 at 01:48:44PM -0300, marcelo wrote:

The installation I'm planning will manage several databases, but not all of
them will change every day.
In order to planning/scripting the pg_dump usage, I would need to know which
databases had some change activity at the end of some day.
How can it be done?

Implement INSERT/UPDATE/DELETE/TRUNCATE triggers on all
tables and DDL triggers on CREATE, ALTER, DROP, SECURITY
LABEL, COMMENT, GRANT or REVOKE.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

wambacher

wnordmann@gmx.de

over 8 years ago

In reply to: marcelo (#1)

Re: How to know if a database has changed

Hi,

a) create triggers on some tables (on update, on delete ...).

let the triggers write some log, check the log or let the triggers
update one central table adding database, tablenames with "table ... in
database ... has been updated, modified'

b) enable detailed loggin and check the system log.

c) both ;)

Regards
walter

Am 11.12.2017 um 17:48 schrieb marcelo:

Show quoted text

The installation I'm planning will manage several databases, but not
all of them will change every day.
In order to planning/scripting the pg_dump usage, I would need to know
which databases had some change activity at the end of some day.
How can it be done?
TIA

James Keener

jim@jimkeener.com

over 8 years ago

In reply to: marcelo (#1)

Re: How to know if a database has changed

The two non elegant ways I can think of is checking the modification time on the files representing the database and a query that checks the pk of all tables. If they're ordered pk you could store the max of them and then compare, otherwise the max of an updated at column would work as well.

Jim

On December 11, 2017 11:48:44 AM EST, marcelo <marcelo.nicolet@gmail.com> wrote:

The installation I'm planning will manage several databases, but not
all
of them will change every day.
In order to planning/scripting the pg_dump usage, I would need to know
which databases had some change activity at the end of some day.
How can it be done?
TIA

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Jaime Casanova

jcasanov@systemguards.com.ec

over 8 years ago

In reply to: marcelo (#1)

Re: How to know if a database has changed

On 11 December 2017 at 11:48, marcelo <marcelo.nicolet@gmail.com> wrote:

The installation I'm planning will manage several databases, but not all of
them will change every day.
In order to planning/scripting the pg_dump usage, I would need to know which
databases had some change activity at the end of some day.
How can it be done?

Hi,

Just keep a snapshot of pg_stat_database view
(https://www.postgresql.org/docs/10/static/monitoring-stats.html#PG-STAT-DATABASE-VIEW)

At the end of the day compare the counters of
tup_inserted/tup_updated/tup_deleted or just xact_commit/xact_rollback
if the snapshot you kept is different from the current values there
were modifications on the database.

Maybe there are some caveats, for example CREATE commands are included
because they insert data in table catalogs? i don't know, is up to you
to check that

--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Andreas Kretschmer

andreas@a-kretschmer.de

over 8 years ago

In reply to: marcelo (#1)

Re: How to know if a database has changed

Am 11.12.2017 um 17:48 schrieb marcelo:

The installation I'm planning will manage several databases, but not
all of them will change every day.
In order to planning/scripting the pg_dump usage, I would need to know
which databases had some change activity at the end of some day.
How can it be done?
TIA

I think you can use logical replication. Take a view at
https://github.com/eulerto/wal2json (for instance, there are other
solutions available), just capture the wal and check for changes for
every database.

it's just a rough idea...

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

Andreas Kretschmer

andreas@a-kretschmer.de

over 8 years ago

In reply to: Andreas Kretschmer (#6)

Re: How to know if a database has changed

Am 11.12.2017 um 18:26 schrieb Andreas Kretschmer:

it's just a rough idea...

... and not perfect, because you can't capture ddl in this way.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

Sam Gendler

sgendler@ideasculptor.com

over 8 years ago

In reply to: Andreas Kretschmer (#7)

Re: How to know if a database has changed

I think there's a more useful question, which is why do you want to do
this? If it is just about conditional backups, surely the cost of backup
storage is low enough, even in S3 or the like, that a duplicate backup is
an afterthought from a cost perspective? Before you start jumping through
hoops to make your backups conditional, I'd first do some analysis and
figure out what the real cost of the thing I'm trying to avoid actually is,
since my guess is that you are deep into a premature optimization
<http://wiki.c2.com/?PrematureOptimization> here, where either the cost of
the duplicate backup isn't consequential or the frequency of duplicate
backups is effectively 0. It would always be possible to run some kind of
checksum on the backup and skip storing it if it matches the previous
backup's checksum if you decide that there truly is value in conditionally
backing up the db. Sure, that would result in dumping a db that doesn't
need to be dumped, but if your write transaction rate is so low that
backups end up being duplicates on a regular basis, then surely you can
afford the cost of a pg_dump without any significant impact on performance?

On Mon, Dec 11, 2017 at 10:49 AM, Andreas Kretschmer <
andreas@a-kretschmer.de> wrote:

Show quoted text

Am 11.12.2017 um 18:26 schrieb Andreas Kretschmer:

it's just a rough idea...

... and not perfect, because you can't capture ddl in this way.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

marcelo

marcelo.nicolet@gmail.com

over 8 years ago

In reply to: Sam Gendler (#8)

Re: How to know if a database has changed

Hi Sam

You are right, and here are the reason behind my question: The server
where postgres will be installed is not on 24/7. It turns on in the
morning and goes off at the end of the day. The idea is that, as part of
the shutdown process, a local backup is made. The next day, that backup
will be copied to the cloud.
In order not to lengthen the shutdown process, we are trying to limit
pg_dump to the databases that have had some change, not so much in their
schema as in their data.
Of course, to add a trigger for every table and CUD operation on every
database is not an option.

Show quoted text

On 11/12/17 23:23, Sam Gendler wrote:

I think there's a more useful question, which is why do you want to do
this? If it is just about conditional backups, surely the cost of
backup storage is low enough, even in S3 or the like, that a duplicate
backup is an afterthought from a cost perspective? Before you start
jumping through hoops to make your backups conditional, I'd first do
some analysis and figure out what the real cost of the thing I'm
trying to avoid actually is, since my guess is that you are deep into
a premature optimization <http://wiki.c2.com/?PrematureOptimization>
here, where either the cost of the duplicate backup isn't
consequential or the frequency of duplicate backups is effectively 0.
It would always be possible to run some kind of checksum on the backup
and skip storing it if it matches the previous backup's checksum if
you decide that there truly is value in conditionally backing up the
db. Sure, that would result in dumping a db that doesn't need to be
dumped, but if your write transaction rate is so low that backups end
up being duplicates on a regular basis, then surely you can afford the
cost of a pg_dump without any significant impact on performance?

On Mon, Dec 11, 2017 at 10:49 AM, Andreas Kretschmer
<andreas@a-kretschmer.de <mailto:andreas@a-kretschmer.de>> wrote:

Am 11.12.2017 um 18:26 schrieb Andreas Kretschmer:

it's just a rough idea...

... and not perfect, because you can't capture ddl in this way.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com <http://www.2ndQuadrant.com>

#10

Adam Tauno Williams

awilliam@whitemice.org

over 8 years ago

In reply to: marcelo (#9)

Re: How to know if a database has changed

The next day, that backup will be copied to the cloud.

What does this mean? ï¿½If it is rsync of a local dump to a remote use
the directory dump format - disable compression - then each table which
didn't change will 'copy' almost instantly.

--
Meetings Coordinator, Michigan Association of Railroad Passengers
537 Shirley St NE Grand Rapids, MI 49503-1754 Phone: 616.581.8010
E-mail: awilliam@whitemice.org GPG#D95ED383 Web: http://www.marp.org

#11

Karsten Hilbert

Karsten.Hilbert@gmx.net

over 8 years ago

In reply to: Adam Tauno Williams (#10)

Re: How to know if a database has changed

On Tue, Dec 12, 2017 at 07:40:46AM -0500, Adam Tauno Williams wrote:

The next day, that backup will be copied to the cloud.

What does this mean? ï¿½If it is rsync of a local dump to a remote use
the directory dump format - disable compression - then each table which
didn't change will 'copy' almost instantly.

I think OP wants to minimize the time spent during shutdown,
not during the next day.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

#12

Edson Carlos Ericksson Richter

richter@simkorp.com.br

over 8 years ago

In reply to: marcelo (#9)

Re: How to know if a database has changed

Em 12/12/2017 10:14, marcelo escreveu:

Hi Sam

You are right, and here are the reason behind my question: The server
where postgres will be installed is not on 24/7. It turns on in the
morning and goes off at the end of the day. The idea is that, as part
of the shutdown process, a local backup is made. The next day, that
backup will be copied to the cloud.
In order not to lengthen the shutdown process, we are trying to limit
pg_dump to the databases that have had some change, not so much in
their schema as in their data.
Of course, to add a trigger for every table and CUD operation on every
database is not an option.

On 11/12/17 23:23, Sam Gendler wrote:

I think there's a more useful question, which is why do you want to
do this? If it is just about conditional backups, surely the cost of
backup storage is low enough, even in S3 or the like, that a
duplicate backup is an afterthought from a cost perspective? Before
you start jumping through hoops to make your backups conditional, I'd
first do some analysis and figure out what the real cost of the thing
I'm trying to avoid actually is, since my guess is that you are deep
into a premature optimization
<http://wiki.c2.com/?PrematureOptimization> here, where either the
cost of the duplicate backup isn't consequential or the frequency of
duplicate backups is effectively 0. It would always be possible to
run some kind of checksum on the backup and skip storing it if it
matches the previous backup's checksum if you decide that there truly
is value in conditionally backing up the db. Sure, that would result
in dumping a db that doesn't need to be dumped, but if your write
transaction rate is so low that backups end up being duplicates on a
regular basis, then surely you can afford the cost of a pg_dump
without any significant impact on performance?

On Mon, Dec 11, 2017 at 10:49 AM, Andreas Kretschmer
<andreas@a-kretschmer.de <mailto:andreas@a-kretschmer.de>> wrote:

Am 11.12.2017 um 18:26 schrieb Andreas Kretschmer:

it's just a rough idea...

... and not perfect, because you can't capture ddl in this way.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com <http://www.2ndQuadrant.com>

Hi, there are plenty of options for secure and optimized bacup solutions
for your scenario.
Since you want to backup to the cloud, why not to use pgBarman with
diferential backup + log (wal) shipping?
It is perfect, you will have zero downtime, and you can shutdown your
database anytime you want.
In my experience, diferential backups (with rsync and symlinks) provide
excellent performance and reduced storage (avoid duplications), and
works perfectly well, as it also provides automatic rotation for old
backups (you define the rules from a set of options).
I've been using pgBarman since 1.4, and I'm very satisfied with it.

Just my 2c,

Edson Richter