Constant WAL replay
We have toyed around with PostgreSQL's WAL structure and we are
wondering whether it is (theoretically) possible to replay WAL records
coming from a remote host while a system is working in read only mode.
The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.
This is not hard to do. The problem is: How can we replay data on the
slave while the slave is answering read-only SQL-statements?
As far as I understand the code we can either recover the system or
answer SQL statements. Which kind of modifications are necessary to
replay and answer at the same time? (assuming that locks and
transactional information are available on the slave).
Input is very much appreciated.
Best regards,
Hans
--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/664/393 39 74
www.cybertec.at, www.postgresql.at
=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres@cybertec.at> writes:
We have toyed around with PostgreSQL's WAL structure and we are
wondering whether it is (theoretically) possible to replay WAL records
coming from a remote host while a system is working in read only mode.
There's been some idle speculation about that, and offhand I don't know
of any showstopper reasons why it couldn't be done ... but no one's
actually looked into making it happen. Feel free ;-)
Which kind of modifications are necessary to
replay and answer at the same time? (assuming that locks and
transactional information are available on the slave).
I think you should definitely assume that locks are *not* available, and
probably not any live transaction info. The discussions before included
the idea of being able to do queries against a known-broken,
non-recoverable database without making it any worse than it was before.
In the current code there is no such thing as a hard read-only behavior
--- for example we will try to update commit-status hint bits no matter
what. Allowing that to be turned off would be interesting for a number
of purposes, such as burning a database onto CD.
regards, tom lane
Hans-J�rgen Sch�nig wrote:
The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.
BTW, what is the benefit to using a GCS here?
-Neil
what. Allowing that to be turned off would be interesting for a number
of purposes, such as burning a database onto CD.
FWIW, Oracle suggests a "transportable tablespace" for this feature.
Which is a tablespace that is not written too and which can be read by
any database.
Would that solve the purposes you mean?
Greetings, Klaus
Klaus Naumann <lists@distinctmind.de> writes:
what. Allowing that to be turned off would be interesting for a number
of purposes, such as burning a database onto CD.
FWIW, Oracle suggests a "transportable tablespace" for this feature.
Which is a tablespace that is not written too and which can be read by
any database.
Would that solve the purposes you mean?
It's a very long way from here to there. In particular, since different
installations have different transaction histories, the XIDs in the
table could not be transportable. You'd almost be forced to build
something like the non-MVCC, XID-less table type that was being
speculated about up-thread.
regards, tom lane
On Sun, Apr 24, 2005 at 11:41:17AM -0400, Tom Lane wrote:
Klaus Naumann <lists@distinctmind.de> writes:
what. Allowing that to be turned off would be interesting for a number
of purposes, such as burning a database onto CD.FWIW, Oracle suggests a "transportable tablespace" for this feature.
Which is a tablespace that is not written too and which can be read by
any database.
Would that solve the purposes you mean?It's a very long way from here to there. In particular, since different
installations have different transaction histories, the XIDs in the
table could not be transportable.
Unless the tables are frozen first.
--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo l�gico y coherente. Pero el universo real se halla siempre
un paso m�s all� de la l�gica" (Irulan)
On Sun, Apr 24, 2005 at 08:10:34AM +0200, Hans-J�rgen Sch�nig wrote:
The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.
This is not hard to do.
In fact, I believe Command Prompt's Mammoth Replicator does exactly
this.
--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"La victoria es para quien se atreve a estar solo"
Alvaro Herrera wrote:
On Sun, Apr 24, 2005 at 08:10:34AM +0200, Hans-J�rgen Sch�nig wrote:
The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.
This is not hard to do.In fact, I believe Command Prompt's Mammoth Replicator does exactly
this.
Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.
Sincerely,
Joshua D. Drake
Command Prompt, Inc.
Joshua D. Drake wrote:
Alvaro Herrera wrote:
On Sun, Apr 24, 2005 at 08:10:34AM +0200, Hans-J?rgen Sch?nig wrote:
The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.
This is not hard to do.In fact, I believe Command Prompt's Mammoth Replicator does exactly
this.Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.
Can you run queries on the slave? If so, how do you handle xid collisions?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.Can you run queries on the slave? If so, how do you handle xid collisions?
You can run any query that does not modify data on a replicated table.
You can run any non data modifying query on any of the tables.
Sincerely,
Joshua D. Drake
Show quoted text
Joshua D. Drake wrote:
Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.Can you run queries on the slave? If so, how do you handle xid collisions?
You can run any query that does not modify data on a replicated table.
You can run any non data modifying query on any of the tables.
So, do you modify the slave to prevent it from grabbing an xid?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Neil Conway wrote:
Hans-J�rgen Sch�nig wrote:
The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.BTW, what is the benefit to using a GCS here?
-Neil
currently i have used it because it is simple.
at the moment we are just doing experiments to see how log information
can be treated. shipping the data is not the problem - the real problem
is getting it back in.
good ideas are welcome ;)
best regards,
hans
--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/664/393 39 74
www.cybertec.at, www.postgresql.at
Joshua D. Drake wrote:
Alvaro Herrera wrote:
On Sun, Apr 24, 2005 at 08:10:34AM +0200, Hans-J�rgen Sch�nig wrote:
The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records
to the slave database.
This is not hard to do.In fact, I believe Command Prompt's Mammoth Replicator does exactly
this.Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.Sincerely,
Joshua D. Drake
Command Prompt, Inc.
Joshua,
This sounds interesting. If you don't use the WAL but a transaction log
shipping - what does it mean in terms of PostgreSQL? Do you create your
own transaction log?
What really interests me here: Where is you code located in order to
make sure that this things can work reliably?
Best regards,
Hans
--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/664/393 39 74
www.cybertec.at, www.postgresql.at
Joshua,
This sounds interesting. If you don't use the WAL but a transaction log
shipping - what does it mean in terms of PostgreSQL? Do you create your
own transaction log?
Yes.
What really interests me here: Where is you code located in order to
make sure that this things can work reliably?
We are integrated into the PostgreSQL backend.
Sincerely,
Joshua D. Drake
Best regards,
Hans
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedication Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
Joshua D. Drake wrote:
Joshua,
This sounds interesting. If you don't use the WAL but a transaction log
shipping - what does it mean in terms of PostgreSQL? Do you create your
own transaction log?Yes.
What really interests me here: Where is you code located in order to
make sure that this things can work reliably?We are integrated into the PostgreSQL backend.
So they have to get a new PostgreSQL release from you for every minor
upgrade, I assume at no cost?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Joshua D. Drake wrote:
Joshua,
This sounds interesting. If you don't use the WAL but a transaction
log shipping - what does it mean in terms of PostgreSQL? Do you
create your own transaction log?Yes.
What really interests me here: Where is you code located in order to
make sure that this things can work reliably?We are integrated into the PostgreSQL backend.
.... which is a large place ;).
best regards,
hans
What really interests me here: Where is you code located in order to
make sure that this things can work reliably?We are integrated into the PostgreSQL backend.
So they have to get a new PostgreSQL release from you for every minor
upgrade, I assume at no cost?
Yep :)
Sincerely,
Joshua D. Drake
Command Prompt, Inc.
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedication Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
On Sun, 2005-04-24 at 02:28 -0400, Tom Lane wrote:
In the current code there is no such thing as a hard read-only behavior --- for example we will try to update commit-status hint bits no matter what. Allowing that to be turned off would be interesting for a number of purposes, such as burning a database onto CD.
Short patch enclosed to turn off writing of commit-status hint bits.
I've called this "cache_txn_status_with_data" but I'm open to other
suggestions...
This should allow migration of older child tables to hierarchical
storage when using a large historical table design.
Best Regards, Simon Riggs
Attachments:
hintbits.patchtext/x-patch; charset=UTF-8; name=hintbits.patchDownload+15-0
Simon Riggs <simon@2ndquadrant.com> writes:
Short patch enclosed to turn off writing of commit-status hint bits.
Doesn't this entirely destroy the ability to truncate clog, and
therefore the ability to survive XID wraparound?
It probably also breaks subxact and multixact logging, but I haven't
looked closely...
regards, tom lane
On Tue, 2005-07-19 at 22:24 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Short patch enclosed to turn off writing of commit-status hint bits.
Doesn't this entirely destroy the ability to truncate clog, and
therefore the ability to survive XID wraparound?
I hope not for all our sakes, since the hint bits are not WAL logged and
anything that relies upon them would be fragile. If we were going to
move a table to WORM storage, then we'd have done VACUUM FREEZE first
anyway, which would be required to avoid XID wraparound. This patch
would then guarantee that no further system-initiated writes take place.
Not AFAICS.... comments from vacuum.c included:
/*
* Now scan all the pages that we moved tuples onto and update tuple
* status bits. This is not really necessary, but will save time for
* future transactions examining these tuples.
*/
update_hint_bits(onerel, fraged_pages, num_fraged_pages,
last_move_dest_block, num_moved);
then again on the function itself...
/*
* update_hint_bits() -- update hint bits in destination pages
*
* Scan all the pages that we moved tuples onto and update tuple status
bits.
* This is normally not really necessary, but it will save time for
future
* transactions examining these tuples.
*
*
* For the non-freeze case, one wonders whether it wouldn't be better to
skip
* this work entirely, and let the tuple status updates happen someplace
* that's not holding an exclusive lock on the relation.
*/
It probably also breaks subxact and multixact logging, but I haven't
looked closely...
AFAIK this has nothing to do with that, since those locks are not
persistent across a crash,,,but conceivably something in 2PC might be
effected.
Best Regards, Simon Riggs