Constant WAL replay

Started by Hans-Jürgen Schönigalmost 21 years ago26 messageshackers

postgres@cybertec.at

almost 21 years ago

We have toyed around with PostgreSQL's WAL structure and we are
wondering whether it is (theoretically) possible to replay WAL records
coming from a remote host while a system is working in read only mode.

The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.
This is not hard to do. The problem is: How can we replay data on the
slave while the slave is answering read-only SQL-statements?

As far as I understand the code we can either recover the system or
answer SQL statements. Which kind of modifications are necessary to
replay and answer at the same time? (assuming that locks and
transactional information are available on the slave).

Input is very much appreciated.

Best regards,

Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/664/393 39 74
www.cybertec.at, www.postgresql.at

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Hans-Jürgen Schönig (#1)

Re: Constant WAL replay

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres@cybertec.at> writes:

We have toyed around with PostgreSQL's WAL structure and we are
wondering whether it is (theoretically) possible to replay WAL records
coming from a remote host while a system is working in read only mode.

There's been some idle speculation about that, and offhand I don't know
of any showstopper reasons why it couldn't be done ... but no one's
actually looked into making it happen. Feel free ;-)

Which kind of modifications are necessary to
replay and answer at the same time? (assuming that locks and
transactional information are available on the slave).

I think you should definitely assume that locks are *not* available, and
probably not any live transaction info.  The discussions before included
the idea of being able to do queries against a known-broken,
non-recoverable database without making it any worse than it was before.
In the current code there is no such thing as a hard read-only behavior
--- for example we will try to update commit-status hint bits no matter
what.  Allowing that to be turned off would be interesting for a number
of purposes, such as burning a database onto CD.

regards, tom lane

Neil Conway

neilc@samurai.com

almost 21 years ago

In reply to: Hans-Jürgen Schönig (#1)

Re: Constant WAL replay

Hans-Jï¿½rgen Schï¿½nig wrote:

The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.

BTW, what is the benefit to using a GCS here?

-Neil

Klaus Naumann

lists@distinctmind.de

almost 21 years ago

In reply to: Tom Lane (#2)

Re: Constant WAL replay

what. Allowing that to be turned off would be interesting for a number
of purposes, such as burning a database onto CD.

FWIW, Oracle suggests a "transportable tablespace" for this feature.
Which is a tablespace that is not written too and which can be read by
any database.
Would that solve the purposes you mean?

Greetings, Klaus

Tom Lane

tgl@sss.pgh.pa.us

almost 21 years ago

In reply to: Klaus Naumann (#4)

Re: Constant WAL replay

Klaus Naumann <lists@distinctmind.de> writes:

what. Allowing that to be turned off would be interesting for a number
of purposes, such as burning a database onto CD.

FWIW, Oracle suggests a "transportable tablespace" for this feature.
Which is a tablespace that is not written too and which can be read by
any database.
Would that solve the purposes you mean?

It's a very long way from here to there. In particular, since different
installations have different transaction histories, the XIDs in the
table could not be transportable. You'd almost be forced to build
something like the non-MVCC, XID-less table type that was being
speculated about up-thread.

regards, tom lane

Alvaro Herrera

alvherre@dcc.uchile.cl

almost 21 years ago

In reply to: Tom Lane (#5)

Re: Constant WAL replay

On Sun, Apr 24, 2005 at 11:41:17AM -0400, Tom Lane wrote:

Klaus Naumann <lists@distinctmind.de> writes:

what. Allowing that to be turned off would be interesting for a number
of purposes, such as burning a database onto CD.

FWIW, Oracle suggests a "transportable tablespace" for this feature.
Which is a tablespace that is not written too and which can be read by
any database.
Would that solve the purposes you mean?

It's a very long way from here to there. In particular, since different
installations have different transaction histories, the XIDs in the
table could not be transportable.

Unless the tables are frozen first.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lï¿½gico y coherente. Pero el universo real se halla siempre
un paso mï¿½s allï¿½ de la lï¿½gica" (Irulan)

Alvaro Herrera

alvherre@dcc.uchile.cl

almost 21 years ago

In reply to: Hans-Jürgen Schönig (#1)

Re: Constant WAL replay

On Sun, Apr 24, 2005 at 08:10:34AM +0200, Hans-Jï¿½rgen Schï¿½nig wrote:

The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.
This is not hard to do.

In fact, I believe Command Prompt's Mammoth Replicator does exactly
this.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"La victoria es para quien se atreve a estar solo"

Joshua D. Drake

jd@commandprompt.com

almost 21 years ago

In reply to: Alvaro Herrera (#7)

Re: Constant WAL replay

Alvaro Herrera wrote:

On Sun, Apr 24, 2005 at 08:10:34AM +0200, Hans-Jï¿½rgen Schï¿½nig wrote:

The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.
This is not hard to do.

In fact, I believe Command Prompt's Mammoth Replicator does exactly
this.

Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.

Sincerely,

Joshua D. Drake
Command Prompt, Inc.

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Joshua D. Drake (#8)

Re: Constant WAL replay

Joshua D. Drake wrote:

Alvaro Herrera wrote:

On Sun, Apr 24, 2005 at 08:10:34AM +0200, Hans-J?rgen Sch?nig wrote:

The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.
This is not hard to do.

In fact, I believe Command Prompt's Mammoth Replicator does exactly
this.

Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.

Can you run queries on the slave? If so, how do you handle xid collisions?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#10

Joshua D. Drake

jd@commandprompt.com

almost 21 years ago

In reply to: Bruce Momjian (#9)

Re: Constant WAL replay

Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.

Can you run queries on the slave? If so, how do you handle xid collisions?

You can run any query that does not modify data on a replicated table.
You can run any non data modifying query on any of the tables.

Sincerely,

Joshua D. Drake

Show quoted text

#11

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Joshua D. Drake (#10)

Re: Constant WAL replay

Joshua D. Drake wrote:

Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.

Can you run queries on the slave? If so, how do you handle xid collisions?

You can run any query that does not modify data on a replicated table.
You can run any non data modifying query on any of the tables.

So, do you modify the slave to prevent it from grabbing an xid?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#12

Hans-Jürgen Schönig

postgres@cybertec.at

almost 21 years ago

In reply to: Neil Conway (#3)

Re: Constant WAL replay

Neil Conway wrote:

Hans-Jï¿½rgen Schï¿½nig wrote:

The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records to
the slave database.

BTW, what is the benefit to using a GCS here?

-Neil

currently i have used it because it is simple.
at the moment we are just doing experiments to see how log information
can be treated. shipping the data is not the problem - the real problem
is getting it back in.

good ideas are welcome ;)

best regards,

hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/664/393 39 74
www.cybertec.at, www.postgresql.at

#13

Hans-Jürgen Schönig

postgres@cybertec.at

almost 21 years ago

In reply to: Joshua D. Drake (#8)

Re: Constant WAL replay

Joshua D. Drake wrote:

Alvaro Herrera wrote:

On Sun, Apr 24, 2005 at 08:10:34AM +0200, Hans-Jï¿½rgen Schï¿½nig wrote:

The idea: We are looking for a way to implement a synchronous
single-master / multiple slaves systems.
Meanwhile we are able to serialize / deserialize WAL records and send
them to a group communication system which transports those records
to the slave database.
This is not hard to do.

In fact, I believe Command Prompt's Mammoth Replicator does exactly
this.

Very close. We don't use the WAL (yet, slated for probably 8.1) but we
do use a transaction log shipping method. So the implementation is
almost the same.

Sincerely,

Joshua D. Drake
Command Prompt, Inc.

Joshua,

This sounds interesting. If you don't use the WAL but a transaction log
shipping - what does it mean in terms of PostgreSQL? Do you create your
own transaction log?
What really interests me here: Where is you code located in order to
make sure that this things can work reliably?

Best regards,

Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/664/393 39 74
www.cybertec.at, www.postgresql.at

#14

Joshua D. Drake

jd@commandprompt.com

almost 21 years ago

In reply to: Hans-Jürgen Schönig (#13)

Re: Constant WAL replay

Joshua,

This sounds interesting. If you don't use the WAL but a transaction log
shipping - what does it mean in terms of PostgreSQL? Do you create your
own transaction log?

Yes.

What really interests me here: Where is you code located in order to
make sure that this things can work reliably?

We are integrated into the PostgreSQL backend.

Sincerely,

Joshua D. Drake

Best regards,

Hans

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedication Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

#15

Bruce Momjian

bruce@momjian.us

almost 21 years ago

In reply to: Joshua D. Drake (#14)

Re: Constant WAL replay

Joshua D. Drake wrote:

Joshua,

This sounds interesting. If you don't use the WAL but a transaction log
shipping - what does it mean in terms of PostgreSQL? Do you create your
own transaction log?

Yes.

What really interests me here: Where is you code located in order to
make sure that this things can work reliably?

We are integrated into the PostgreSQL backend.

So they have to get a new PostgreSQL release from you for every minor
upgrade, I assume at no cost?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

#16

Hans-Jürgen Schönig

postgres@cybertec.at

almost 21 years ago

In reply to: Joshua D. Drake (#14)

Re: Constant WAL replay

Joshua D. Drake wrote:

Joshua,

This sounds interesting. If you don't use the WAL but a transaction
log shipping - what does it mean in terms of PostgreSQL? Do you
create your own transaction log?

Yes.

What really interests me here: Where is you code located in order to
make sure that this things can work reliably?

We are integrated into the PostgreSQL backend.

.... which is a large place ;).

best regards,

hans

#17

Joshua D. Drake

jd@commandprompt.com

almost 21 years ago

In reply to: Bruce Momjian (#15)

Re: Constant WAL replay

What really interests me here: Where is you code located in order to
make sure that this things can work reliably?

We are integrated into the PostgreSQL backend.

So they have to get a new PostgreSQL release from you for every minor
upgrade, I assume at no cost?

Yep :)

Sincerely,

Joshua D. Drake
Command Prompt, Inc.

#18

Simon Riggs

simon@2ndQuadrant.com

over 20 years ago

In reply to: Tom Lane (#2)

Writing Commit Status hint bits (was Re: [HACKERS] Constant WAL replay)

On Sun, 2005-04-24 at 02:28 -0400, Tom Lane wrote:

In the current code there is no such thing as a hard read-only behavior
--- for example we will try to update commit-status hint bits no matter
what.  Allowing that to be turned off would be interesting for a number
of purposes, such as burning a database onto CD.

Short patch enclosed to turn off writing of commit-status hint bits.

I've called this "cache_txn_status_with_data" but I'm open to other
suggestions...

This should allow migration of older child tables to hierarchical
storage when using a large historical table design.

Best Regards, Simon Riggs

#19

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Simon Riggs (#18)

Re: Writing Commit Status hint bits (was Re: [HACKERS] Constant WAL replay)

Simon Riggs <simon@2ndquadrant.com> writes:

Short patch enclosed to turn off writing of commit-status hint bits.

Doesn't this entirely destroy the ability to truncate clog, and
therefore the ability to survive XID wraparound?

It probably also breaks subxact and multixact logging, but I haven't
looked closely...

regards, tom lane

#20

Simon Riggs

simon@2ndQuadrant.com

over 20 years ago

In reply to: Tom Lane (#19)

Re: Writing Commit Status hint bits (was Re: [HACKERS] Constant

On Tue, 2005-07-19 at 22:24 -0400, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Short patch enclosed to turn off writing of commit-status hint bits.

Doesn't this entirely destroy the ability to truncate clog, and
therefore the ability to survive XID wraparound?

I hope not for all our sakes, since the hint bits are not WAL logged and
anything that relies upon them would be fragile. If we were going to
move a table to WORM storage, then we'd have done VACUUM FREEZE first
anyway, which would be required to avoid XID wraparound. This patch
would then guarantee that no further system-initiated writes take place.

Not AFAICS.... comments from vacuum.c included:

/*
* Now scan all the pages that we moved tuples onto and update tuple
* status bits. This is not really necessary, but will save time for
* future transactions examining these tuples.
*/
update_hint_bits(onerel, fraged_pages, num_fraged_pages,
last_move_dest_block, num_moved);

then again on the function itself...

/*
* update_hint_bits() -- update hint bits in destination pages
*
* Scan all the pages that we moved tuples onto and update tuple status
bits.
* This is normally not really necessary, but it will save time for
future
* transactions examining these tuples.
*

*
* For the non-freeze case, one wonders whether it wouldn't be better to
skip
* this work entirely, and let the tuple status updates happen someplace
* that's not holding an exclusive lock on the relation.
*/