Shared pg_xlog directory/partition and warm standby server

Started by Devrim GUNDUZabout 19 years ago8 messages
#1Devrim GUNDUZ
devrim@CommandPrompt.com

Hello,

Is there anything that may prevent two PostgreSQL servers to share the
same pg_xlog directory; while one is using read-only and the other one
is using the same partition for read and write? The problem is: If we
share the same pg_xlog between production server and warm standby
server; can you see any possibility of data/xlog corruption? Of course,
warm standby server will mount that partition as read-only.

I thought a bit on this; could not find any possibilities. Can you think
of one?

Regards,
--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

#2Florian G. Pflug
fgp@phlo.org
In reply to: Devrim GUNDUZ (#1)
Re: Shared pg_xlog directory/partition and warm standby

Devrim GUNDUZ wrote:

Hello,

Is there anything that may prevent two PostgreSQL servers to share the
same pg_xlog directory; while one is using read-only and the other one
is using the same partition for read and write? The problem is: If we
share the same pg_xlog between production server and warm standby
server; can you see any possibility of data/xlog corruption? Of course,
warm standby server will mount that partition as read-only.

What happens in the standby server falls so far behind the master that
the xlogs it wants to read are already being overwritten?

AFAIK the files in pg_xlog form a circular buffer, and are reused after
a while...

greetings, Florian Pflug

#3Simon Riggs
simon@2ndquadrant.com
In reply to: Florian G. Pflug (#2)
Re: Shared pg_xlog directory/partition and warm standby

On Mon, 2006-11-27 at 14:17 +0100, Florian G. Pflug wrote:

Devrim GUNDUZ wrote:

Is there anything that may prevent two PostgreSQL servers to share the
same pg_xlog directory; while one is using read-only and the other one
is using the same partition for read and write? The problem is: If we
share the same pg_xlog between production server and warm standby
server; can you see any possibility of data/xlog corruption? Of course,
warm standby server will mount that partition as read-only.

What happens in the standby server falls so far behind the master that
the xlogs it wants to read are already being overwritten?

AFAIK the files in pg_xlog form a circular buffer, and are reused after
a while...

If the archive_command doesn't actually do anything, just leaves them
there, the files will automatically get moved to .done state and will
then get removed within 2 checkpoints. So it will work as long as your
standby keeps up with the primary. If it falls behind, you'll lose the
file and you'll be out of luck (no file, start from base backup again).
A large checkpoint_segments would help, but no way to avoid that
situation.

The archiver assumes that you want to archive things oldest first, so if
the archive_command fails it will retry on that file repeatedly. Put it
another way the archiving is synchronous: when an archive is requested
we wait for the answer before attempting the next.

I suppose we might want to have multiple archivals occurring
simultaneously by overlapping their start and stop times. That might be
useful for situations where we have a bank of slow response tape
drives/autoloaders?

You'd need to have a second archive command to poll for completion.
Currently archive_status has 2 states: .ready and .done. We could have 3
states: .ready, .inprogress and .done. The first archive_command_start,
if successful would move the state from .ready to .inprogress, while the
second archive_command_confirm would move the state from .inprogress
to .done. (Better names please...)

With an asynchronous API, it would then be possible to fire off requests
to archive lots of files, then return later to confirm their completion.
Or in Devrim's case do nothing apart from wait for them to be applied by
the Standby server.

Anybody else see the need for this?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#4Jim C. Nasby
jim@nasby.net
In reply to: Simon Riggs (#3)
Re: Shared pg_xlog directory/partition and warm standby

On Mon, Nov 27, 2006 at 04:35:30PM +0000, Simon Riggs wrote:

On Mon, 2006-11-27 at 14:17 +0100, Florian G. Pflug wrote:

Devrim GUNDUZ wrote:

Is there anything that may prevent two PostgreSQL servers to share the
same pg_xlog directory; while one is using read-only and the other one
is using the same partition for read and write? The problem is: If we
share the same pg_xlog between production server and warm standby
server; can you see any possibility of data/xlog corruption? Of course,
warm standby server will mount that partition as read-only.

<snip>

I suppose we might want to have multiple archivals occurring
simultaneously by overlapping their start and stop times. That might be
useful for situations where we have a bank of slow response tape
drives/autoloaders?

You'd need to have a second archive command to poll for completion.
Currently archive_status has 2 states: .ready and .done. We could have 3
states: .ready, .inprogress and .done. The first archive_command_start,
if successful would move the state from .ready to .inprogress, while the
second archive_command_confirm would move the state from .inprogress
to .done. (Better names please...)

With an asynchronous API, it would then be possible to fire off requests
to archive lots of files, then return later to confirm their completion.
Or in Devrim's case do nothing apart from wait for them to be applied by
the Standby server.

Anybody else see the need for this?

There might be a desire for async archiving in some circumstances, but I
don't really see what Devrim's after that couldn't just be done with our
current PITR. The only difference I can think of is not having to copy
logfiles around, but presumably that could be addressed by using
hardlinks instead of actually copying (at least on unix...) Maybe Devrim
has something else in mind?
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Florian G. Pflug (#2)
Re: Shared pg_xlog directory/partition and warm standby

"Florian G. Pflug" <fgp@phlo.org> writes:

Devrim GUNDUZ wrote:

Is there anything that may prevent two PostgreSQL servers to share the
same pg_xlog directory; while one is using read-only and the other one
is using the same partition for read and write?

What happens in the standby server falls so far behind the master that
the xlogs it wants to read are already being overwritten?

Worse than that: what happens when the standby comes alive, and needs to
start writing pg_xlog entries?

Sounds like a disaster in the making to me.

regards, tom lane

#6Devrim GUNDUZ
devrim@CommandPrompt.com
In reply to: Jim C. Nasby (#4)
Re: Shared pg_xlog directory/partition and warm standby

Hi,

On Mon, 2006-11-27 at 12:14 -0600, Jim C. Nasby wrote:

The only difference I can think of is not having to copy logfiles
around, but presumably that could be addressed by using hardlinks
instead of actually copying (at least on unix...) Maybe Devrim
has something else in mind?

What I was thinking is to find a way to reduce network traffic in
high-volume environments. If the archive_timeout is set to a really low
value, such as 1 or 2 seconds, it may result in a high traffic.

I thought that if both servers are in the same network, or better,
directly connected to each other, they could share the same partition so
that no network activity occurs.

Anyway, I haven't tried this feature yet on my test server, etc. I am
just trying to understand what's going on and what can be done with this
feature.

Regards,
--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

#7Zeugswetter Andreas ADI SD
ZeugswetterA@spardat.at
In reply to: Simon Riggs (#3)
Re: Shared pg_xlog directory/partition and warm standby

I suppose we might want to have multiple archivals occurring
simultaneously by overlapping their start and stop times.
That might be useful for situations where we have a bank of slow

response tape

drives/autoloaders?

I have never seen such a setup, where it would have helped to archive
DB logs in parallel. The 16 Mb are not enough to get tapes going.
So in setups where you have lots of WAL, I would increase
XLOG_SEG_SIZE. In my experience it is less a db performance issue, than
an administrative and storage system overhead issue (to start a backup
session every few seconds or even subsecond).

e.g. Backup systems like TSM perform better when you don't have so many
tiny files,
all saved separately.

Anybody else see the need for this?

No :-)

Andreas

#8Florian G. Pflug
fgp@phlo.org
In reply to: Devrim GUNDUZ (#6)
Re: Shared pg_xlog directory/partition and warm standby

Devrim GUNDUZ wrote:

Hi,

On Mon, 2006-11-27 at 12:14 -0600, Jim C. Nasby wrote:

The only difference I can think of is not having to copy logfiles
around, but presumably that could be addressed by using hardlinks
instead of actually copying (at least on unix...) Maybe Devrim
has something else in mind?

What I was thinking is to find a way to reduce network traffic in
high-volume environments. If the archive_timeout is set to a really low
value, such as 1 or 2 seconds, it may result in a high traffic.

Using hardlinks sounds like a viable alternative - but since AFAIK
postgres reuses old wal segments instead of deleting and recreating
them, I guess hardlinks wouldn't work....

I thought that if both servers are in the same network, or better,
directly connected to each other, they could share the same partition so
that no network activity occurs.

But if they're connected over a fast network anyway, then copying wals
even every few seconds should be no problem, no?

greetings, Florian Pflug