RE: [GENERAL] Transaction logging
What would it take to have transaction logging added to Postgres. I am a
c/c++ programmer and will consider contributing to the Postgres development
effort. I really like everything I see and read about Postgres. As a
result, I am strongly considering Postgres as the database engine for my
Membership database application. My customer is willing to invest in a
commercial database, but most of the commercial databases I have briefly
looked at fall a little short in one way or another. I have several
concerns/needs that I am willing to implement and/or support:
- Outer join support in views
- Transaction logging
- Some form of mirroring, shadowing, or replication
- The current locking mechanism is of some concern. I need to make
sure that one user can read a record and then a second can read and update
that same record.
- If the first user attempts to update that record, what happens?
I know some of these requests are currently being worked, it would be
helpful to get some idea of when these items are expected to be released.
Thanks, Michael
-----Original Message-----
From: sdupille@i-france.com [SMTP:sdupille@i-france.com]
Sent: Tuesday, February 23, 1999 6:08 AM
To: pgsql-general@postgreSQL.org
Subject: Re: [GENERAL] Transaction logging
Hi !
Peter T Mount <peter@retep.org.uk> writes:
Has anyone implemented transaction logging in Postgres? Any
suggestions on
how to easily implement transaction logging? Storing the log
file in a text
file seems best but I am not sure out to open and write to a
text file from
a trigger. I would also be nice to post this transaction log
against a back
up server.
Just a quick thought, but how about using syslog? That can be used
to post
queries to a remote server, and it can be told to store the
"postgres"
stuff to a seperate file on that server.
Just an idea...
Why not, but I think it's a bad idea. Syslog is used to log
events coming from the system. It stores every connection to the
system, and any event which can affect the system (such as power
shutdown).
The transaction logging is a different taste of log : it
must
store every transaction made to the database, and in case of
deletion
of records, or change to data, it must save the old values. So it
generates a lot of traffic, and is closely dependant of the database
system.
Syslog is not strong enough to deal with so much data, and
the
use of an external process to get the transaction logging would
generate too much traffic (the cost in time would be too high). The
logging facility would, I think, be made by the database itself.
Anything else : the logging facility is used to recover the
database after a crash (mainly). This kind of log _must_ be easy to
use in case of crash. Syslog is very well when you won't to know
what
append, but not to deal with the problem. Don't forget that Syslog
add
some data to the events we send to him (such as the sender and the
date of the message). These data, in case of recovery by transaction
logging mechanism, are noise, which get the recovery (a little bit)
harder.
I don't think that we could get a logging facility with the
use of triggers. I think it would be better to hack the postgres
backend, and supersedes the access to SQL primitives (like insert or
update). It would be a little harder to implement, but faster and
totally transparent to the user.
regards.
--
___
{~._.~} Stephane - DUST - Dupille
( Y ) You were dust and you shall turn into dust
()~*~() email : sdupille@i-france.com
(_)-(_)
What would it take to have transaction logging added to Postgres. I am a
c/c++ programmer and will consider contributing to the Postgres development
effort. I really like everything I see and read about Postgres. As a
result, I am strongly considering Postgres as the database engine for my
Membership database application. My customer is willing to invest in a
commercial database, but most of the commercial databases I have briefly
looked at fall a little short in one way or another. I have several
concerns/needs that I am willing to implement and/or support:- Outer join support in views
In the works. Perhaps for 6.5, probably not.
- Transaction logging
- Some form of mirroring, shadowing, or replication
- The current locking mechanism is of some concern. I need to make
sure that one user can read a record and then a second can read and update
that same record.
MVCC locking in 6.5. Will do what you need.
- If the first user attempts to update that record, what happens?
Hard to explain. Will wait or update a copy while read's use an older
copy fo the row.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Your getting me excited about 6.5. Is there a projected release date for
6.5? Is there any information on transaction logging? Is there anything I
can do to help? I am curious about these items because they will make my
life much easier in the upcoming months as I migrate my application to
Postgres. Working around these could be very difficulty or near impossible.
-----Original Message-----
From: Bruce Momjian [SMTP:maillist@candle.pha.pa.us]
Sent: Tuesday, February 23, 1999 8:02 PM
To: Michael Davis
Cc: pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
Subject: Re: [GENERAL] Transaction logging
What would it take to have transaction logging added to Postgres.
I am a
c/c++ programmer and will consider contributing to the Postgres
development
effort. I really like everything I see and read about Postgres.
As a
result, I am strongly considering Postgres as the database engine
for my
Membership database application. My customer is willing to invest
in a
commercial database, but most of the commercial databases I have
briefly
looked at fall a little short in one way or another. I have
several
concerns/needs that I am willing to implement and/or support:
- Outer join support in views
In the works. Perhaps for 6.5, probably not.
- Transaction logging
- Some form of mirroring, shadowing, or replication
- The current locking mechanism is of some concern. I need
to make
sure that one user can read a record and then a second can read
and update
that same record.
MVCC locking in 6.5. Will do what you need.
- If the first user attempts to update that record, what
happens?
Hard to explain. Will wait or update a copy while read's use an
older
copy fo the row.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania
19026
Import Notes
Resolved by subject fallback
Your getting me excited about 6.5. Is there a projected release date for
6.5? Is there any information on transaction logging? Is there anything I
can do to help? I am curious about these items because they will make my
life much easier in the upcoming months as I migrate my application to
Postgres. Working around these could be very difficulty or near impossible.
We are waiting for the MVCC/locking stuff to be finished. Everything
else is mostly ready. We were planning for Feb 1, but we must wait.
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Your getting me excited about 6.5. Is there a projected release date for
6.5? Is there any information on transaction logging? Is there anything I
can do to help? I am curious about these items because they will make my
life much easier in the upcoming months as I migrate my application to
Postgres. Working around these could be very difficulty or near impossible.
I've spent some time thinking about transaction log. The
first idea was to log queries and in some way. But I had to
drop that approach because there are functions (and users
could have written threir own ones too), that don't return
the same result when the database later get's rolled forward
(e.g. anything handling date's/times). And OTOH an
application could SELECT something from the database that
maybe got created by a sequence, and uses this value then in
another INSERT. But when recovering the database, it isn't
guaranteed that all the data will get the same sequences
again (race conditions in concurrent queries). How should the
transaction log now know that this one constant value in the
query must be substituted by another value to ensure
referential integrity? Absolutely impossible.
So the only way I see is to use some sort of image logging
from inside the heap access methods. Would be much more
tricky to dump and recover.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
On Tue, 23 Feb 1999, Michael Davis wrote:
What would it take to have transaction logging added to Postgres. I am a
c/c++ programmer and will consider contributing to the Postgres development
effort. I really like everything I see and read about Postgres. As a
result, I am strongly considering Postgres as the database engine for my
Membership database application. My customer is willing to invest in a
commercial database, but most of the commercial databases I have briefly
looked at fall a little short in one way or another. I have several
concerns/needs that I am willing to implement and/or support:- Outer join support in views
- Transaction logging
- Some form of mirroring, shadowing, or replication
For this purpose, people might be interested in reading the following document:
Distributed Relational Database Architecture
http://www.opengroup.org/publications/catalog/c812.htm
ABSTRACT: This Technical Standard is one of three volumes
documenting the Distributed Relational Database Architecture
Specification. This volume describes the connectivity between
relational database managers that enables applications programs
to access distributed relational data. It describes the
necessary connection between an application and a relational
database management system in a distributed environment; the
responsibilities of the participants and when flows should occur;
and the formats and protocols required for distributed database
management system processing. It does not describe an API for
distributed database management system processing.
They have PDF downloadable.
If people do decide to start working on db mirroring, this might be a good
doc to read, if nothing else than just to get a better understanding of the
issues involved.
--
Todd Graham Lewis 32���49'N,83���36'W (800) 719-4664, x2804
******Linux****** MindSpring Enterprises tlewis@mindspring.net
"A pint of sweat will save a gallon of blood." -- George S. Patton
Michael Davis wrote:
What would it take to have transaction logging added to Postgres. I am a
c/c++ programmer and will consider contributing to the Postgres development
effort. I really like everything I see and read about Postgres. As a
I spent some time on transaction logging since it's a feature
I'm missing too. There are mainly two different transaction
log mechanisms out.
1. Log queries sent to the backend.
2. Log images of inserted/updated rows and row ID's of
deleted ones.
The query level logging will write less information if
queries usually affect a large number of rows. Unfortunately
the extensibility of Postgres work's against this approach.
There could be any number of user written functions who's
results aren't reproduceable during recovery. And core
features of Postgres itself would introduce the same problem.
Have a sequence which is used to create default values for
multiple tables, so that one ID is unique across them. Now
two backends insert (with INSERT ... SELECT) concurrently
into different tables using the same sequence. It's a
classic race condition and it depends on context switching
and page faults which backend will get which sequence
numbers. You cannot foresee and you cannot reproduce, except
you hook into the sequence generator and log this too. Later
when recovering, another hook into the sequence generator
must reproduce the logged results on the per
backend/transaction/command base, and the same must be done
for each function that usually returns unreproduceable
results (anything dealing with time, pid's, etc.).
As said, this must also cover user functions. So at least
there must be a general log API that provides such a
functionality for user written functions.
The image logging approach also has problems. First, the only
thing given to the heap access methods to outdate a tuple on
update/delete is the current tuple ID (information that tells
which tuple in which block is meant). So you need to save
the database files in binary format, because during the
actually existing dump/restore this could change and the
logged CTID's would hit the wrong tuples.
Second, you must remember in the log which transaction ID
these informations came from and later if the transaction
committed or not, so the recovery can set this commit/abort
information in pg_log too. pg_log is a shared system file and
the transaction ID's are unique only for one server. Using
this information for online replication of a single database
to another Postgres installation will not work.
Third, there are still some shared system catalogs across all
databases (pg_database, pg_group, pg_log!!!, pg_shadow and
pg_variable). Due to that it would be impossible (or at least
very, very tricky) to restore/recover (maybe point in time)
one single database. If you destroy one database and restore
it from the binary backup, these shared catalogs cannot be
restored too, so they're out of sync with the backup time.
How should the recovery now hit the right things (which
probably must not be there at all)?.
All this is really a mess. I think the architecture of
Postgres will only allow something on query level with some
general API for things that must reproduce the same result
during recovery. For example time(). Inside the backend,
time() should never be called directly. Instead another
function is to be called that log's during normal operation
which time get's returned by this particular function call
and if the backend is in recovery mode, returns the value
from the log.
And again, this all means trouble. Usually, most queries sent
to the database don't change any data because they are
SELECT's. It would dramatically blow up the log amount if you
log ALL queries instead of only those that modify things. But
when the query begins, you don't know this, because a SELECT
might call a function that uses SPI to UPDATE something else.
So the decision if the query must be logged or not can only
be made when the query is done (by having some global
variable where the heap access methods set a flag that
something got written). Now you have to log function call's
like time() even if the query will not modify any single row
in the database because the query is a
SELECT 'now'::datetime - updtime FROM ...
Doing this on a table with thousands of rows will definitely
waste much logging space and slowdown the whole thing by
unnecessary logging.
Maybe it's a compromise if at each query start the actual
time and other such information is remembered by the backend,
all time() calls return this remembered value instead of the
real one (wouldn't be bad anyway IMHO), and this information
is logged only if the query is to be logged.
Finally I think I must have missed some more problems, but
aren't these enough already to frustrate you :-?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
All things considered, I would be content with logging insert, update, and
delete instructions to an external file. It would be extremely valuable to
allow this log file to exist on a mounted file system so it could be stored
on a different machine. It would also be valuable to have the user name
originating the transaction, a date time stamp, and possibly a transaction
number as optional fields appended to the logs. Maybe these optional fields
could be added to the sql statement as comments? Are comments allowed in a
sql statement? This would allow for a visual of what has been updated,
when it was updated, and who performed the update. At a later date this
could be extended or an alternate means of logging can be added. My vote
will always go for additional options.
The sooner this is implemented the better for me. I am willing to be
coached on how to implement this. I will be busy for the next 2-4 weeks as
I migrate my Access97 database to PostgreSQL. Once this migration is
completed and released to my users, this will become a hot topic for me.
Thanks, Michael
-----Original Message-----
From: jwieck@debis.com [SMTP:jwieck@debis.com]
Sent: Friday, March 05, 1999 11:24 AM
To: Michael Davis
Cc: pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] RE: [GENERAL] Transaction logging
Michael Davis wrote:
What would it take to have transaction logging added to Postgres.
I am a
c/c++ programmer and will consider contributing to the Postgres
development
effort. I really like everything I see and read about Postgres.
As a
I spent some time on transaction logging since it's a feature
I'm missing too. There are mainly two different transaction
log mechanisms out.
1. Log queries sent to the backend.
2. Log images of inserted/updated rows and row ID's of
deleted ones.
The query level logging will write less information if
queries usually affect a large number of rows. Unfortunately
the extensibility of Postgres work's against this approach.
There could be any number of user written functions who's
results aren't reproduceable during recovery. And core
features of Postgres itself would introduce the same problem.
Have a sequence which is used to create default values for
multiple tables, so that one ID is unique across them. Now
two backends insert (with INSERT ... SELECT) concurrently
into different tables using the same sequence. It's a
classic race condition and it depends on context switching
and page faults which backend will get which sequence
numbers. You cannot foresee and you cannot reproduce, except
you hook into the sequence generator and log this too. Later
when recovering, another hook into the sequence generator
must reproduce the logged results on the per
backend/transaction/command base, and the same must be done
for each function that usually returns unreproduceable
results (anything dealing with time, pid's, etc.).
As said, this must also cover user functions. So at least
there must be a general log API that provides such a
functionality for user written functions.
The image logging approach also has problems. First, the only
thing given to the heap access methods to outdate a tuple on
update/delete is the current tuple ID (information that tells
which tuple in which block is meant). So you need to save
the database files in binary format, because during the
actually existing dump/restore this could change and the
logged CTID's would hit the wrong tuples.
Second, you must remember in the log which transaction ID
these informations came from and later if the transaction
committed or not, so the recovery can set this commit/abort
information in pg_log too. pg_log is a shared system file and
the transaction ID's are unique only for one server. Using
this information for online replication of a single database
to another Postgres installation will not work.
Third, there are still some shared system catalogs across all
databases (pg_database, pg_group, pg_log!!!, pg_shadow and
pg_variable). Due to that it would be impossible (or at least
very, very tricky) to restore/recover (maybe point in time)
one single database. If you destroy one database and restore
it from the binary backup, these shared catalogs cannot be
restored too, so they're out of sync with the backup time.
How should the recovery now hit the right things (which
probably must not be there at all)?.
All this is really a mess. I think the architecture of
Postgres will only allow something on query level with some
general API for things that must reproduce the same result
during recovery. For example time(). Inside the backend,
time() should never be called directly. Instead another
function is to be called that log's during normal operation
which time get's returned by this particular function call
and if the backend is in recovery mode, returns the value
from the log.
And again, this all means trouble. Usually, most queries sent
to the database don't change any data because they are
SELECT's. It would dramatically blow up the log amount if you
log ALL queries instead of only those that modify things. But
when the query begins, you don't know this, because a SELECT
might call a function that uses SPI to UPDATE something else.
So the decision if the query must be logged or not can only
be made when the query is done (by having some global
variable where the heap access methods set a flag that
something got written). Now you have to log function call's
like time() even if the query will not modify any single row
in the database because the query is a
SELECT 'now'::datetime - updtime FROM ...
Doing this on a table with thousands of rows will definitely
waste much logging space and slowdown the whole thing by
unnecessary logging.
Maybe it's a compromise if at each query start the actual
time and other such information is remembered by the backend,
all time() calls return this remembered value instead of the
real one (wouldn't be bad anyway IMHO), and this information
is logged only if the query is to be logged.
Finally I think I must have missed some more problems, but
aren't these enough already to frustrate you :-?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being
right. #
# Let's break this rule - forgive me.
#
#======================================== jwieck@debis.com (Jan
Wieck) #
Import Notes
Resolved by subject fallback
Wouldn't it be possible to have the commit take care of the logging. I
don't know enough about the backend, but all those functions have to get
evaluated at some point before the commit in order to be written to the
db. You could actually have the commit log the needed sql-commands
without any function interference. Also doesn't commit know if any rows
have been updated by any of the transaction's statements. This
could/would require filtering rules and triggers from the output, or
turning them off on the restore.
As far as the shared-tables/catalogues are concerned. We could have
each full-database dump take care of it's entries into the shared
tables. Having the transaction logs be incremental from the point of
the dump allows the catalogues to be self updating (assuming we are
logging vacuums).
All this depends on getting a database commit to output it's information
easily. It also ignores MVCC's effect on transactions, but I think it's
a workable model.
You can tell me I'm blowing smoke now.
-DEJ
Show quoted text
-----Original Message-----
From: jwieck@debis.com [mailto:jwieck@debis.com]
Sent: Friday, March 05, 1999 12:24 PM
To: michael.davis@prevuenet.com
Cc: pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] RE: [GENERAL] Transaction loggingMichael Davis wrote:
What would it take to have transaction logging added to
Postgres. I am a
c/c++ programmer and will consider contributing to the
Postgres development
effort. I really like everything I see and read about
Postgres. As a
I spent some time on transaction logging since it's a feature
I'm missing too. There are mainly two different transaction
log mechanisms out.1. Log queries sent to the backend.
2. Log images of inserted/updated rows and row ID's of
deleted ones.The query level logging will write less information if
queries usually affect a large number of rows. Unfortunately
the extensibility of Postgres work's against this approach.
There could be any number of user written functions who's
results aren't reproduceable during recovery. And core
features of Postgres itself would introduce the same problem.
Have a sequence which is used to create default values for
multiple tables, so that one ID is unique across them. Now
two backends insert (with INSERT ... SELECT) concurrently
into different tables using the same sequence. It's a
classic race condition and it depends on context switching
and page faults which backend will get which sequence
numbers. You cannot foresee and you cannot reproduce, except
you hook into the sequence generator and log this too. Later
when recovering, another hook into the sequence generator
must reproduce the logged results on the per
backend/transaction/command base, and the same must be done
for each function that usually returns unreproduceable
results (anything dealing with time, pid's, etc.).As said, this must also cover user functions. So at least
there must be a general log API that provides such a
functionality for user written functions.The image logging approach also has problems. First, the only
thing given to the heap access methods to outdate a tuple on
update/delete is the current tuple ID (information that tells
which tuple in which block is meant). So you need to save
the database files in binary format, because during the
actually existing dump/restore this could change and the
logged CTID's would hit the wrong tuples.Second, you must remember in the log which transaction ID
these informations came from and later if the transaction
committed or not, so the recovery can set this commit/abort
information in pg_log too. pg_log is a shared system file and
the transaction ID's are unique only for one server. Using
this information for online replication of a single database
to another Postgres installation will not work.Third, there are still some shared system catalogs across all
databases (pg_database, pg_group, pg_log!!!, pg_shadow and
pg_variable). Due to that it would be impossible (or at least
very, very tricky) to restore/recover (maybe point in time)
one single database. If you destroy one database and restore
it from the binary backup, these shared catalogs cannot be
restored too, so they're out of sync with the backup time.
How should the recovery now hit the right things (which
probably must not be there at all)?.All this is really a mess. I think the architecture of
Postgres will only allow something on query level with some
general API for things that must reproduce the same result
during recovery. For example time(). Inside the backend,
time() should never be called directly. Instead another
function is to be called that log's during normal operation
which time get's returned by this particular function call
and if the backend is in recovery mode, returns the value
from the log.And again, this all means trouble. Usually, most queries sent
to the database don't change any data because they are
SELECT's. It would dramatically blow up the log amount if you
log ALL queries instead of only those that modify things. But
when the query begins, you don't know this, because a SELECT
might call a function that uses SPI to UPDATE something else.
So the decision if the query must be logged or not can only
be made when the query is done (by having some global
variable where the heap access methods set a flag that
something got written). Now you have to log function call's
like time() even if the query will not modify any single row
in the database because the query is aSELECT 'now'::datetime - updtime FROM ...
Doing this on a table with thousands of rows will definitely
waste much logging space and slowdown the whole thing by
unnecessary logging.Maybe it's a compromise if at each query start the actual
time and other such information is remembered by the backend,
all time() calls return this remembered value instead of the
real one (wouldn't be bad anyway IMHO), and this information
is logged only if the query is to be logged.Finally I think I must have missed some more problems, but
aren't these enough already to frustrate you :-?Jan
--
#=============================================================
=========#
# It's easier to get forgiveness for being wrong than for
being right. #
# Let's break this rule - forgive me.
#
#======================================== jwieck@debis.com
(Jan Wieck) #
Import Notes
Resolved by subject fallback
I like it.
-----Original Message-----
From: Jackson, DeJuan [SMTP:djackson@cpsgroup.com]
Sent: Friday, March 05, 1999 12:02 PM
To: jwieck@debis.com; Michael Davis
Cc: pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
Subject: RE: [HACKERS] RE: [GENERAL] Transaction logging
Wouldn't it be possible to have the commit take care of the logging.
I
don't know enough about the backend, but all those functions have to
get
evaluated at some point before the commit in order to be written to
the
db. You could actually have the commit log the needed sql-commands
without any function interference. Also doesn't commit know if any
rows
have been updated by any of the transaction's statements. This
could/would require filtering rules and triggers from the output, or
turning them off on the restore.
As far as the shared-tables/catalogues are concerned. We could have
each full-database dump take care of it's entries into the shared
tables. Having the transaction logs be incremental from the point
of
the dump allows the catalogues to be self updating (assuming we are
logging vacuums).
All this depends on getting a database commit to output it's
information
easily. It also ignores MVCC's effect on transactions, but I think
it's
a workable model.
You can tell me I'm blowing smoke now.
-DEJ
Show quoted text
-----Original Message-----
From: jwieck@debis.com [mailto:jwieck@debis.com]
Sent: Friday, March 05, 1999 12:24 PM
To: michael.davis@prevuenet.com
Cc: pgsql-general@postgreSQL.org; pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] RE: [GENERAL] Transaction loggingMichael Davis wrote:
What would it take to have transaction logging added to
Postgres. I am a
c/c++ programmer and will consider contributing to the
Postgres development
effort. I really like everything I see and read about
Postgres. As a
I spent some time on transaction logging since it's a feature
I'm missing too. There are mainly two different transaction
log mechanisms out.1. Log queries sent to the backend.
2. Log images of inserted/updated rows and row ID's of
deleted ones.The query level logging will write less information if
queries usually affect a large number of rows. Unfortunately
the extensibility of Postgres work's against this approach.
There could be any number of user written functions who's
results aren't reproduceable during recovery. And core
features of Postgres itself would introduce the same problem.
Have a sequence which is used to create default values for
multiple tables, so that one ID is unique across them. Now
two backends insert (with INSERT ... SELECT) concurrently
into different tables using the same sequence. It's a
classic race condition and it depends on context switching
and page faults which backend will get which sequence
numbers. You cannot foresee and you cannot reproduce, except
you hook into the sequence generator and log this too. Later
when recovering, another hook into the sequence generator
must reproduce the logged results on the per
backend/transaction/command base, and the same must be done
for each function that usually returns unreproduceable
results (anything dealing with time, pid's, etc.).As said, this must also cover user functions. So at least
there must be a general log API that provides such a
functionality for user written functions.The image logging approach also has problems. First, the only
thing given to the heap access methods to outdate a tuple on
update/delete is the current tuple ID (information that tells
which tuple in which block is meant). So you need to save
the database files in binary format, because during the
actually existing dump/restore this could change and the
logged CTID's would hit the wrong tuples.Second, you must remember in the log which transaction ID
these informations came from and later if the transaction
committed or not, so the recovery can set this commit/abort
information in pg_log too. pg_log is a shared system file and
the transaction ID's are unique only for one server. Using
this information for online replication of a single database
to another Postgres installation will not work.Third, there are still some shared system catalogs across all
databases (pg_database, pg_group, pg_log!!!, pg_shadow and
pg_variable). Due to that it would be impossible (or at least
very, very tricky) to restore/recover (maybe point in time)
one single database. If you destroy one database and restore
it from the binary backup, these shared catalogs cannot be
restored too, so they're out of sync with the backup time.
How should the recovery now hit the right things (which
probably must not be there at all)?.All this is really a mess. I think the architecture of
Postgres will only allow something on query level with some
general API for things that must reproduce the same result
during recovery. For example time(). Inside the backend,
time() should never be called directly. Instead another
function is to be called that log's during normal operation
which time get's returned by this particular function call
and if the backend is in recovery mode, returns the value
from the log.And again, this all means trouble. Usually, most queries sent
to the database don't change any data because they are
SELECT's. It would dramatically blow up the log amount if you
log ALL queries instead of only those that modify things. But
when the query begins, you don't know this, because a SELECT
might call a function that uses SPI to UPDATE something else.
So the decision if the query must be logged or not can only
be made when the query is done (by having some global
variable where the heap access methods set a flag that
something got written). Now you have to log function call's
like time() even if the query will not modify any single row
in the database because the query is aSELECT 'now'::datetime - updtime FROM ...
Doing this on a table with thousands of rows will definitely
waste much logging space and slowdown the whole thing by
unnecessary logging.Maybe it's a compromise if at each query start the actual
time and other such information is remembered by the backend,
all time() calls return this remembered value instead of the
real one (wouldn't be bad anyway IMHO), and this information
is logged only if the query is to be logged.Finally I think I must have missed some more problems, but
aren't these enough already to frustrate you :-?Jan
--
#=============================================================
=========#
# It's easier to get forgiveness for being wrong than for
being right. #
# Let's break this rule - forgive me.
#
#======================================== jwieck@debis.com
(Jan Wieck) #
Import Notes
Resolved by subject fallback
On Fri, 5 Mar 1999, Michael Davis wrote:
All things considered, I would be content with logging insert, update, and
delete instructions to an external file. It would be extremely valuable to
allow this log file to exist on a mounted file system so it could be stored
on a different machine. It would also be valuable to have the user name
originating the transaction, a date time stamp, and possibly a transaction
number as optional fields appended to the logs.
I agree that this would be an _excellent_ first cut at logging. Yes, if
you have "insert time()", then you will have problems, but for the majority
of applications, what is suggested above will be good enough, and I'm a big
anti-fan of letting the best be the enemy of the good.
Maybe these optional fields
could be added to the sql statement as comments? Are comments allowed in a
sql statement? This would allow for a visual of what has been updated,
when it was updated, and who performed the update. At a later date this
could be extended or an alternate means of logging can be added. My vote
will always go for additional options.
Just define a document format for the logs which allow for this data.
If you do it in XML (GNOME and mozilla both have great free xml libraries
available), then all of the normal, nasty issues, like whitespace and
parsing, are completely taken care of for you. Just a suggestion.
--
Todd Graham Lewis 32���49'N,83���36'W (800) 719-4664, x2804
******Linux****** MindSpring Enterprises tlewis@mindspring.net
"A pint of sweat will save a gallon of blood." -- George S. Patton
Added to TODO:
* Transaction log, so re-do log can be on a separate disk by
logging SQL queries, or before/after row images
Michael Davis wrote:
What would it take to have transaction logging added to Postgres. I am a
c/c++ programmer and will consider contributing to the Postgres development
effort. I really like everything I see and read about Postgres. As aI spent some time on transaction logging since it's a feature
I'm missing too. There are mainly two different transaction
log mechanisms out.1. Log queries sent to the backend.
2. Log images of inserted/updated rows and row ID's of
deleted ones.The query level logging will write less information if
queries usually affect a large number of rows. Unfortunately
the extensibility of Postgres work's against this approach.
There could be any number of user written functions who's
results aren't reproduceable during recovery. And core
features of Postgres itself would introduce the same problem.
Have a sequence which is used to create default values for
multiple tables, so that one ID is unique across them. Now
two backends insert (with INSERT ... SELECT) concurrently
into different tables using the same sequence. It's a
classic race condition and it depends on context switching
and page faults which backend will get which sequence
numbers. You cannot foresee and you cannot reproduce, except
you hook into the sequence generator and log this too. Later
when recovering, another hook into the sequence generator
must reproduce the logged results on the per
backend/transaction/command base, and the same must be done
for each function that usually returns unreproduceable
results (anything dealing with time, pid's, etc.).As said, this must also cover user functions. So at least
there must be a general log API that provides such a
functionality for user written functions.The image logging approach also has problems. First, the only
thing given to the heap access methods to outdate a tuple on
update/delete is the current tuple ID (information that tells
which tuple in which block is meant). So you need to save
the database files in binary format, because during the
actually existing dump/restore this could change and the
logged CTID's would hit the wrong tuples.Second, you must remember in the log which transaction ID
these informations came from and later if the transaction
committed or not, so the recovery can set this commit/abort
information in pg_log too. pg_log is a shared system file and
the transaction ID's are unique only for one server. Using
this information for online replication of a single database
to another Postgres installation will not work.Third, there are still some shared system catalogs across all
databases (pg_database, pg_group, pg_log!!!, pg_shadow and
pg_variable). Due to that it would be impossible (or at least
very, very tricky) to restore/recover (maybe point in time)
one single database. If you destroy one database and restore
it from the binary backup, these shared catalogs cannot be
restored too, so they're out of sync with the backup time.
How should the recovery now hit the right things (which
probably must not be there at all)?.All this is really a mess. I think the architecture of
Postgres will only allow something on query level with some
general API for things that must reproduce the same result
during recovery. For example time(). Inside the backend,
time() should never be called directly. Instead another
function is to be called that log's during normal operation
which time get's returned by this particular function call
and if the backend is in recovery mode, returns the value
from the log.And again, this all means trouble. Usually, most queries sent
to the database don't change any data because they are
SELECT's. It would dramatically blow up the log amount if you
log ALL queries instead of only those that modify things. But
when the query begins, you don't know this, because a SELECT
might call a function that uses SPI to UPDATE something else.
So the decision if the query must be logged or not can only
be made when the query is done (by having some global
variable where the heap access methods set a flag that
something got written). Now you have to log function call's
like time() even if the query will not modify any single row
in the database because the query is aSELECT 'now'::datetime - updtime FROM ...
Doing this on a table with thousands of rows will definitely
waste much logging space and slowdown the whole thing by
unnecessary logging.Maybe it's a compromise if at each query start the actual
time and other such information is remembered by the backend,
all time() calls return this remembered value instead of the
real one (wouldn't be bad anyway IMHO), and this information
is logged only if the query is to be logged.Finally I think I must have missed some more problems, but
aren't these enough already to frustrate you :-?Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Added to TODO:
* Transaction log, so re-do log can be on a separate disk by
logging SQL queries, or before/after row images
I would drop the "log SQL queries idea".
No need to log before row images eighter, since this is the
current state of the row during rollforward.
(For asserts a checksum of the before image would be sufficient,
but IMHO not necessary.)
I suggest:
* Transaction log that stores after row (or even only column)
images,
which can be put on a separate disk to allow rollforward after
a restore of a server.
The "restore of a server" is a main problem here, but I suggest the
following
additional backup tool, that could be used for a "restore of a server"
which could then be used for a rollforward and would also be a lot faster
than a pg_dump:
1. place a vacuum lock on db (we don't want vacuum during backup)
2. backup pg_log using direct file access (something like dd bs=32k)
3. backup the rest in any order (same as pg_log)
4. release vacuum lock
If this was restored, this should lead to a consistent database,
that has all transactions after the start of backup rolled back.
Is there a nono in this idea? I feel it should work.
A problem is probably, that the first to touch a row with a committed update
stores this info in that row. There would probably need to be an undo for
this
after restore of the physical files.
Andreas
Import Notes
Resolved by subject fallback
Well, I'm thinking about WAL last two weeks. Hiroshi pointed me
problems in my approach to savepoints (when a tuple was marked
for update and updated after it) and solution would require
new tid field in header and both t_cmin/t_cmax => bigger header.
I don't like it and so I switched my mind -:).
I'm using "Transaction Processing..." book from Bruce - thanks
a lot, it's very helpful.
I'll come with thoughts and feels in next few days...
Zeugswetter Andreas IZ5 wrote:
Added to TODO:
* Transaction log, so re-do log can be on a separate disk by
logging SQL queries, or before/after row imagesI would drop the "log SQL queries idea".
Me too.
No need to log before row images eighter, since this is the
current state of the row during rollforward.
This is true as long as we follow non-overwriting - may be
changed some day.
The "restore of a server" is a main problem here, but I suggest the
following
additional backup tool, that could be used for a "restore of a server"
which could then be used for a rollforward and would also be a lot faster
than a pg_dump:1. place a vacuum lock on db (we don't want vacuum during backup)
2. backup pg_log using direct file access (something like dd bs=32k)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3. backup the rest in any order (same as pg_log)
4. release vacuum lock
It looks like log archiving, not backup.
I believe that _full_ backup will do near the same
things as pg_dump now, but _incremental_ backup will
fetch info about what changed after last _full_ backup
from log.
Vadim
Updated TODO:
* Transaction log, so re-do log can be on a separate disk by
with after-row images
Added to TODO:
* Transaction log, so re-do log can be on a separate disk by
logging SQL queries, or before/after row imagesI would drop the "log SQL queries idea".
No need to log before row images eighter, since this is the
current state of the row during rollforward.
(For asserts a checksum of the before image would be sufficient,
but IMHO not necessary.)I suggest:
* Transaction log that stores after row (or even only column)
images,
which can be put on a separate disk to allow rollforward after
a restore of a server.The "restore of a server" is a main problem here, but I suggest the
following
additional backup tool, that could be used for a "restore of a server"
which could then be used for a rollforward and would also be a lot faster
than a pg_dump:1. place a vacuum lock on db (we don't want vacuum during backup)
2. backup pg_log using direct file access (something like dd bs=32k)
3. backup the rest in any order (same as pg_log)
4. release vacuum lockIf this was restored, this should lead to a consistent database,
that has all transactions after the start of backup rolled back.Is there a nono in this idea? I feel it should work.
A problem is probably, that the first to touch a row with a committed update
stores this info in that row. There would probably need to be an undo for
this
after restore of the physical files.Andreas
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Vadim Mikheev wrote:
The "restore of a server" is a main problem here, but I suggest the
following
additional backup tool, that could be used for a "restore of a server"
which could then be used for a rollforward and would also be a lot faster
than a pg_dump:1. place a vacuum lock on db (we don't want vacuum during backup)
2. backup pg_log using direct file access (something like dd bs=32k)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3. backup the rest in any order (same as pg_log)
4. release vacuum lockIt looks like log archiving, not backup.
I believe that _full_ backup will do near the same
things as pg_dump now, but _incremental_ backup will
fetch info about what changed after last _full_ backup
from log.
Sorry, I was wrong. pg_dump is what's known as Export utility
in Oracle and backup is quite different thing. But I have
corrections for full backup described above:
1. no vacuum lock is needed: all vacuum ops will be logged
in normal way to rollback changes in failures;
2. all datafiles have to be backed up _before_ log backup
due to WAL logic: changes must be written to log before
they'll be written to on-disk data pages.
Vadim
Vadim Mikheev wrote:
The "restore of a server" is a main problem here, but I suggest the
following
additional backup tool, that could be used for a "restore of a server"
which could then be used for a rollforward and would also be a lotfaster
than a pg_dump:
1. place a vacuum lock on db (we don't want vacuum during backup)
2. backup pg_log using direct file access (something like dd bs=32k)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3. backup the rest in any order (same as pg_log)
4. release vacuum lockIt looks like log archiving, not backup.
I believe that _full_ backup will do near the same
things as pg_dump now, but _incremental_ backup will
fetch info about what changed after last _full_ backup
from log.Sorry, I was wrong. pg_dump is what's known as Export utility
in Oracle and backup is quite different thing. But I have
corrections for full backup described above:1. no vacuum lock is needed: all vacuum ops will be logged
in normal way to rollback changes in failures;
Yes.
2. all datafiles have to be backed up _before_ log backup
due to WAL logic: changes must be written to log before
they'll be written to on-disk data pages.
When I was talking about pg_log, I meant pg_log as it is now.
As I understand it, it only stores commit/rollback info for each used xtid
and no other info.
This would be all we need, for a rollback of all transactions that were not
committed at the time the backup began, as long as no vacuum removes
the old rows (and these are not reused). The xtid's that are higher than the
largest xtid in pg_log need also be rolled back. I am not sure though
whether
we have enough info after the commit is flushed to the new row.
This flush would have to be undone at restore time.
I like this approach more than always needing a transaction log at restore
time.
It makes it possible to configure a db to not write a transaction log,
as postgresql behaves now. After all a lot of installations only need to be
able
to restore the database to the state it was at the last full backup.
The main issue is IMHO a very fast consistent online backup,
and a fast foolproof restore of same. The transaction log, and
rollforward comes after that.
Andreas
PS: for rollback you need the before image of rows, I would keep this in a
separate place like Oracle (rollback segment) and Informix (physical log)
since this info does not need to go to the rollforward tape.
Although if we did keep this info in the WAL, then postgresql could also do
a "rollback in time" by walking this log in the opposite direction.
Might be worth discussion.
Import Notes
Resolved by subject fallback
Zeugswetter Andreas IZ5 wrote:
2. all datafiles have to be backed up _before_ log backup
due to WAL logic: changes must be written to log before
they'll be written to on-disk data pages.When I was talking about pg_log, I meant pg_log as it is now.
As I understand it, it only stores commit/rollback info for each used xtid
and no other info.
Actually, I would like to implement WAL as it's done in other systems.
There would be no more pg_log with xact statuses as now. But for
the first implementation it's easy to leave pg_log as is (UNDO
is hard to implement). In any case WAL will be _single_ source
about everything - what's changes and what transactions were
commited/aborted. From this point of view pg_log will be just
one of datafiles: on recovery changes (commit/abort statuses)
will be applied to pg_log just like to other datafiles.
PS: for rollback you need the before image of rows, I would keep
this in a separate place like Oracle (rollback segment) and Informix
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oracle places rollback segments in memory to speedup abort/MVCC.
Before images are in WAL and used to restore rollback segments
on recovery.
(physical log) since this info does not need to go to the
rollforward tape.
Vadim