XLog changes for 9.3

Started by Heikki Linnakangasover 13 years ago23 messages

heikki.linnakangas@enterprisedb.com

over 13 years ago

When I worked on the XLogInsert scaling patch, it became apparent that
some changes to the WAL format would make it a lot easier. So for 9.3,
I'd like to do some refactoring:

1. Use a 64-bit integer instead of the two-variable log/seg
representation, for identifying a WAL segment. This has no user-visible
effect, but makes the code a bit simpler.

2. Don't waste the last WAL segment in each logical 4GB file. Currently,
we skip the WAL segment ending with "FF". The comments claim that
wasting the last segment "ensures that we don't have problems
representing last-byte-position-plus-1", but in my experience, it just
makes things more complicated. You have two ways to represent the
segment boundary, and some functions are picky on which one is used. For
example, XLogWrite() assumes that when you want to flush to the end of a
logical log file, you use the "5/FF000000" representation, not
"6/00000000". Other functions, like XLogPageRead(), expect the latter.

This is a backwards-incompatible change for external utilities that know
how the WAL segment numbering works. Hopefully there aren't too many of
those around.

3. Move the only field, xl_rem_len, from the continuation record header
straight to the xlog page header, eliminating XLogContRecord altogether.
This makes it easier to calculate in advance how much space a WAL record
requires, as it no longer depends on how many pages it has to be split
across. This wastes 4-8 bytes on every xlog page, but that's not much.

4. Allow WAL record header to be split across page boundaries.
Currently, if there are less than SizeOfXLogRecord bytes left on the
current WAL page, it is wasted, and the next record is inserted at the
beginning of the next page. The problem with that is again that it makes
it impossible to know in advance exactly how much space a WAL record
requires, because it depends on how many bytes need to be wasted at the
end of current page.

These changes will help the XLogInsert scaling patch, by making the
space calculations simpler. In essence, to reserve space for a WAL
record of size X, you just need to do "bytepos += X". There's a lot
more details with that, like mapping from the contiguous byte position
to an XLogRecPtr that takes page headers into account, and noticing
RedoRecPtr changes safely, but it's a start.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Andres Freund

andres@2ndquadrant.com

over 13 years ago

In reply to: Heikki Linnakangas (#1)

Re: XLog changes for 9.3

On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote:

When I worked on the XLogInsert scaling patch, it became apparent that
some changes to the WAL format would make it a lot easier. So for 9.3,
I'd like to do some refactoring:

1. Use a 64-bit integer instead of the two-variable log/seg
representation, for identifying a WAL segment. This has no user-visible
effect, but makes the code a bit simpler.

We can define a sensible InvalidXLogRecPtr instead of doing that locally in
loads of places! Yipee.

2. Don't waste the last WAL segment in each logical 4GB file. Currently,
we skip the WAL segment ending with "FF". The comments claim that
wasting the last segment "ensures that we don't have problems
representing last-byte-position-plus-1", but in my experience, it just
makes things more complicated. You have two ways to represent the
segment boundary, and some functions are picky on which one is used. For
example, XLogWrite() assumes that when you want to flush to the end of a
logical log file, you use the "5/FF000000" representation, not
"6/00000000". Other functions, like XLogPageRead(), expect the latter.

This is a backwards-incompatible change for external utilities that know
how the WAL segment numbering works. Hopefully there aren't too many of
those around.

3. Move the only field, xl_rem_len, from the continuation record header
straight to the xlog page header, eliminating XLogContRecord altogether.
This makes it easier to calculate in advance how much space a WAL record
requires, as it no longer depends on how many pages it has to be split
across. This wastes 4-8 bytes on every xlog page, but that's not much.

+1. I don't think this will waste a measureable amount in real-world
scenarios. A very big percentag of pages have continuation records.

4. Allow WAL record header to be split across page boundaries.
Currently, if there are less than SizeOfXLogRecord bytes left on the
current WAL page, it is wasted, and the next record is inserted at the
beginning of the next page. The problem with that is again that it makes
it impossible to know in advance exactly how much space a WAL record
requires, because it depends on how many bytes need to be wasted at the
end of current page.

+0.5. Its somewhat convenient to be able to look at a record before you have
reassembled it over multiple pages. But its probably not worth the
implementation complexity.
If we do that we can remove all the aligment padding as well. Which would be a
problem for you anyway, wouldn't it?

These changes will help the XLogInsert scaling patch, by making the
space calculations simpler. In essence, to reserve space for a WAL
record of size X, you just need to do "bytepos += X". There's a lot
more details with that, like mapping from the contiguous byte position
to an XLogRecPtr that takes page headers into account, and noticing
RedoRecPtr changes safely, but it's a start.

Hm. Wouldn't you need to remove short/long page headers for that as well?

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Heikki Linnakangas (#1)

Re: XLog changes for 9.3

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

When I worked on the XLogInsert scaling patch, it became apparent that
some changes to the WAL format would make it a lot easier. So for 9.3,
I'd like to do some refactoring:

1. Use a 64-bit integer instead of the two-variable log/seg
representation, for identifying a WAL segment. This has no user-visible
effect, but makes the code a bit simpler.

2. Don't waste the last WAL segment in each logical 4GB file. Currently,
we skip the WAL segment ending with "FF". The comments claim that
wasting the last segment "ensures that we don't have problems
representing last-byte-position-plus-1", but in my experience, it just
makes things more complicated.

I think that's actually an indivisible part of point #1. The issue in
the 32+32 representation is that you'd overflow the low-order half when
trying to represent last-byte-of-file-plus-1, and have to do something
with propagating that to the high half. In a 64-bit continuous
addressing scheme the problem goes away, and it would just get more
complicated not less to preserve the "hole".

regards, tom lane

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Andres Freund (#2)

Re: XLog changes for 9.3

On 07.06.2012 17:18, Andres Freund wrote:

On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote:

3. Move the only field, xl_rem_len, from the continuation record header
straight to the xlog page header, eliminating XLogContRecord altogether.
This makes it easier to calculate in advance how much space a WAL record
requires, as it no longer depends on how many pages it has to be split
across. This wastes 4-8 bytes on every xlog page, but that's not much.

+1. I don't think this will waste a measureable amount in real-world
scenarios. A very big percentag of pages have continuation records.

Yeah, although the way I'm planning to do it, you'll waste 4 bytes (on
64-bit architectures) even when there is a continuation record, because
of alignment:

typedef struct XLogPageHeaderData
{
uint16 xlp_magic; /* magic value for correctness checks */
uint16 xlp_info; /* flag bits, see below */
TimeLineID xlp_tli; /* TimeLineID of first record on
XLogRecPtr xlp_pageaddr; /* XLOG address of this page */

+ uint32 xlp_rem_len; /* bytes remaining of continued record */
} XLogPageHeaderData;

The page header is currently 16 bytes in length, so adding a 4-byte
field to it bumps the aligned size to 24 bytes. Nevertheless, I think we
can well live with that.

4. Allow WAL record header to be split across page boundaries.
Currently, if there are less than SizeOfXLogRecord bytes left on the
current WAL page, it is wasted, and the next record is inserted at the
beginning of the next page. The problem with that is again that it makes
it impossible to know in advance exactly how much space a WAL record
requires, because it depends on how many bytes need to be wasted at the
end of current page.

+0.5. Its somewhat convenient to be able to look at a record before you have
reassembled it over multiple pages. But its probably not worth the
implementation complexity.

Looking at the code, I think it'll be about the same complexity for
XLogInsert in its current form (it will help the patch I'm working on),
and makes ReadRecord() a bit more complicated. But not much.

If we do that we can remove all the aligment padding as well. Which would be a
problem for you anyway, wouldn't it?

It's not a problem. You just MAXALIGN the size of the record when you
calculate how much space it needs, and then all records become naturally
MAXALIGNed. We could quite easily remove the alignment on-disk if we
wanted to, ReadRecord() already always copies the record to an aligned
buffer, but I wasn't planning to do that.

These changes will help the XLogInsert scaling patch, by making the
space calculations simpler. In essence, to reserve space for a WAL
record of size X, you just need to do "bytepos += X". There's a lot
more details with that, like mapping from the contiguous byte position
to an XLogRecPtr that takes page headers into account, and noticing
RedoRecPtr changes safely, but it's a start.

Hm. Wouldn't you need to remove short/long page headers for that as well?

No, those are ok because they're predictable. Although it would make the
mapping simpler. To convert from a contiguous xlog byte position that
excludes all headers, to XLogRecPtr, you need to do something like this
(I just made this up, probably has bugs, but it's about this complex):

#define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD)
#define UsableBytesInSegment ((XLOG_SEG_SIZE / XLOG_BLCKSZ) *
UsableBytesInPage - (SizeOfXLogLongPHD - SizeOfXLogShortPHD)

uint64 xlogrecptr;
uint64 full_segments = bytepos / UsableBytesInSegment;
int offset_in_segment = bytepos % UsableBytesInSegment;

xlogrecptr = full_segments * XLOG_SEG_SIZE;
/* is it on the first page? */
if (offset_in_segment < XLOG_BLCKSZ - SizeOfXLogLongPHD)
xlogrecptr += SizeOfXLogLongPHD + offset_in_segment;
else
{
/* first page is fully used */
xlogrecptr += XLOG_BLCKSZ;
/* add other full pages */
offset_in_segment -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
xlogrecptr += (offset_in_segment / UsableBytesInPage) * XLOG_BLCKSZ;
/* and finally offset within the last page */
xlogrecptr += offset_in_segment % UsableBytesInPage;
}
/* finally convert the 64-bit xlogrecptr to a XLogRecPtr struct */
XLogRecPtr.xlogid = xlogrecptr >> 32;
XLogRecPtr.xrecoff = xlogrecptr & 0xffffffff;

Capsulated in a function, that's not too bad. But if we want to make
that simpler, one idea would be to allocate the whole 1st page in each
WAL segment for metadata. That way all the actual xlog pages would hold
the same amount of xlog data.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Heikki Linnakangas (#1)

Re: XLog changes for 9.3

On 7 June 2012 14:50, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

These changes will help the XLogInsert scaling patch

...and as I'm sure you're aware will junk much of the replication code
and almost certainly set back the other work that we have brewing for
9.3. So this is a very large curve ball you're throwing there.

Personally, I don't think we should do this until we have a better
regression test suite around replication and recovery because the
impact will be huge but I welcome the suggested changes themselves.

If you are going to do this in 9.3, then it has to be early in the
first Commit Fest and you'll need to be around to quickly follow
through on all of the other subsequent breakages it will cause,
otherwise every other piece of work in this area will be halted or
delayed.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Andres Freund

andres@2ndquadrant.com

over 13 years ago

In reply to: Simon Riggs (#5)

Re: XLog changes for 9.3

Hi,

On Thursday, June 07, 2012 05:51:07 PM Simon Riggs wrote:

On 7 June 2012 14:50, Heikki Linnakangas

<heikki.linnakangas@enterprisedb.com> wrote:

These changes will help the XLogInsert scaling patch

...and as I'm sure you're aware will junk much of the replication code
and almost certainly set back the other work that we have brewing for
9.3. So this is a very large curve ball you're throwing there.

It's not that bad. Most of that code is pretty abstracted, the changes to
adapt to that should be less than 20 lines. And it would remove some of the
complexity.

Personally, I don't think we should do this until we have a better
regression test suite around replication and recovery because the
impact will be huge but I welcome the suggested changes themselves.

Hm. One could regard the logical rep stuff as a testsuite ;)

If you are going to do this in 9.3, then it has to be early in the
first Commit Fest and you'll need to be around to quickly follow
through on all of the other subsequent breakages it will cause,
otherwise every other piece of work in this area will be halted or
delayed.

Yea, I would definitely welcome an early patch.

Greetings,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Magnus Hagander

magnus@hagander.net

over 13 years ago

In reply to: Andres Freund (#6)

Re: XLog changes for 9.3

On Thu, Jun 7, 2012 at 5:56 PM, Andres Freund <andres@2ndquadrant.com> wrote:

Hi,

On Thursday, June 07, 2012 05:51:07 PM Simon Riggs wrote:

On 7 June 2012 14:50, Heikki Linnakangas

<heikki.linnakangas@enterprisedb.com> wrote:

These changes will help the XLogInsert scaling patch

...and as I'm sure you're aware will junk much of the replication code
and almost certainly set back the other work that we have brewing for
9.3. So this is a very large curve ball you're throwing there.

It's not that bad. Most of that code is pretty abstracted, the changes to
adapt to that should be less than 20 lines. And it would remove some of the
complexity.

Personally, I don't think we should do this until we have a better
regression test suite around replication and recovery because the
impact will be huge but I welcome the suggested changes themselves.

Hm. One could regard the logical rep stuff as a testsuite ;)

If you are going to do this in 9.3, then it has to be early in the
first Commit Fest and you'll need to be around to quickly follow
through on all of the other subsequent breakages it will cause,
otherwise every other piece of work in this area will be halted or
delayed.

Yea, I would definitely welcome an early patch.

Just as I'm sure everybody else would welcome *your* patches landing
in the first commitfest and that you all guarantee to be around
quickly follow through on all potential breakages *that* can cause.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Andres Freund

andres@2ndquadrant.com

over 13 years ago

In reply to: Magnus Hagander (#7)

Re: XLog changes for 9.3

On Thursday, June 07, 2012 06:02:12 PM Magnus Hagander wrote:

On Thu, Jun 7, 2012 at 5:56 PM, Andres Freund <andres@2ndquadrant.com>

wrote:

Hi,

On Thursday, June 07, 2012 05:51:07 PM Simon Riggs wrote:

On 7 June 2012 14:50, Heikki Linnakangas

<heikki.linnakangas@enterprisedb.com> wrote:

These changes will help the XLogInsert scaling patch

...and as I'm sure you're aware will junk much of the replication code
and almost certainly set back the other work that we have brewing for
9.3. So this is a very large curve ball you're throwing there.

It's not that bad. Most of that code is pretty abstracted, the changes to
adapt to that should be less than 20 lines. And it would remove some of
the complexity.

Personally, I don't think we should do this until we have a better
regression test suite around replication and recovery because the
impact will be huge but I welcome the suggested changes themselves.

Hm. One could regard the logical rep stuff as a testsuite ;)

If you are going to do this in 9.3, then it has to be early in the
first Commit Fest and you'll need to be around to quickly follow
through on all of the other subsequent breakages it will cause,
otherwise every other piece of work in this area will be halted or
delayed.

Yea, I would definitely welcome an early patch.

Just as I'm sure everybody else would welcome *your* patches landing
in the first commitfest and that you all guarantee to be around
quickly follow through on all potential breakages *that* can cause.

Agreed.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 13 years ago

In reply to: Simon Riggs (#5)

Re: XLog changes for 9.3

On 07.06.2012 18:51, Simon Riggs wrote:

On 7 June 2012 14:50, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

These changes will help the XLogInsert scaling patch

...and as I'm sure you're aware will junk much of the replication code
and almost certainly set back the other work that we have brewing for
9.3. So this is a very large curve ball you're throwing there.

I don't think this has much impact on what you're doing (although it's a
bit hard to tell without more details). The way WAL records work is the
same, it's just the code that lays them out on a page, and reads back
from a page, that's changed. And that's fairly isolated in xlog.c.

If you are going to do this in 9.3, then it has to be early in the
first Commit Fest and you'll need to be around to quickly follow
through on all of the other subsequent breakages it will cause,
otherwise every other piece of work in this area will be halted or
delayed.

Yeah, the plan is to get this in early, in the first commit fest. Not
only because of possible breakage, but also because my ultimate goal is
the XLogInsert refactoring, and I want do that early in the release
cycle, too.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#10

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Simon Riggs (#5)

Re: XLog changes for 9.3

On Thu, Jun 7, 2012 at 11:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

So this is a very large curve ball you're throwing there.

This is not exactly unexpected. At least the first two of these items
were previous discussed in the context of the XLOG scaling patch, many
months ago. It shouldn't come as a surprise to anyone that Heikki is
planning to continue to work on that patch even though it didn't make
9.2.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11

Andres Freund

andres@2ndquadrant.com

over 13 years ago

In reply to: Heikki Linnakangas (#4)

Re: XLog changes for 9.3

On Thursday, June 07, 2012 05:35:11 PM Heikki Linnakangas wrote:

On 07.06.2012 17:18, Andres Freund wrote:

On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote:

3. Move the only field, xl_rem_len, from the continuation record header
straight to the xlog page header, eliminating XLogContRecord altogether.
This makes it easier to calculate in advance how much space a WAL record
requires, as it no longer depends on how many pages it has to be split
across. This wastes 4-8 bytes on every xlog page, but that's not much.

+1. I don't think this will waste a measureable amount in real-world
scenarios. A very big percentag of pages have continuation records.

Yeah, although the way I'm planning to do it, you'll waste 4 bytes (on
64-bit architectures) even when there is a continuation record, because
of alignment:

typedef struct XLogPageHeaderData
{
uint16 xlp_magic; /* magic value for correctness checks */
uint16 xlp_info; /* flag bits, see below */
TimeLineID xlp_tli; /* TimeLineID of first record on
XLogRecPtr xlp_pageaddr; /* XLOG address of this page */

+ uint32 xlp_rem_len; /* bytes remaining of continued record */
} XLogPageHeaderData;

The page header is currently 16 bytes in length, so adding a 4-byte
field to it bumps the aligned size to 24 bytes. Nevertheless, I think we
can well live with that.

At that point we can just do the
#define SizeofXLogPageHeaderData (offsetof(XLogPageHeaderData, xlp_pageaddr) +
sizeof(uint32))
dance. If the record can be smeared over two pages there is no point in
storing it aligned. Then we don't waste any additional space in comparison to
the current state.

If we do that we can remove all the aligment padding as well. Which would
be a problem for you anyway, wouldn't it?

It's not a problem. You just MAXALIGN the size of the record when you
calculate how much space it needs, and then all records become naturally
MAXALIGNed. We could quite easily remove the alignment on-disk if we
wanted to, ReadRecord() already always copies the record to an aligned
buffer, but I wasn't planning to do that.

Whats the reasoning for having alignment on disk if the records aren't stored
continually?

These changes will help the XLogInsert scaling patch, by making the
space calculations simpler. In essence, to reserve space for a WAL
record of size X, you just need to do "bytepos += X". There's a lot
more details with that, like mapping from the contiguous byte position
to an XLogRecPtr that takes page headers into account, and noticing
RedoRecPtr changes safely, but it's a start.

Hm. Wouldn't you need to remove short/long page headers for that as well?

No, those are ok because they're predictable.

I haven't read your scalability patch, so I am not really sure what you
need...
The "bytepos += X" from above isn't as easy that way. But yes, its not that
complicated.

Although it would make the
mapping simpler. To convert from a contiguous xlog byte position that
excludes all headers, to XLogRecPtr, you need to do something like this
(I just made this up, probably has bugs, but it's about this complex):

#define UsableBytesInPage (XLOG_BLCKSZ - SizeOfXLogShortPHD)
#define UsableBytesInSegment ((XLOG_SEG_SIZE / XLOG_BLCKSZ) *
UsableBytesInPage - (SizeOfXLogLongPHD - SizeOfXLogShortPHD)

uint64 xlogrecptr;
uint64 full_segments = bytepos / UsableBytesInSegment;
int offset_in_segment = bytepos % UsableBytesInSegment;

xlogrecptr = full_segments * XLOG_SEG_SIZE;
/* is it on the first page? */
if (offset_in_segment < XLOG_BLCKSZ - SizeOfXLogLongPHD)
xlogrecptr += SizeOfXLogLongPHD + offset_in_segment;
else
{
/* first page is fully used */
xlogrecptr += XLOG_BLCKSZ;
/* add other full pages */
offset_in_segment -= XLOG_BLCKSZ - SizeOfXLogLongPHD;
xlogrecptr += (offset_in_segment / UsableBytesInPage) * XLOG_BLCKSZ;
/* and finally offset within the last page */
xlogrecptr += offset_in_segment % UsableBytesInPage;
}
/* finally convert the 64-bit xlogrecptr to a XLogRecPtr struct */
XLogRecPtr.xlogid = xlogrecptr >> 32;
XLogRecPtr.xrecoff = xlogrecptr & 0xffffffff;

Its a bit more complicated than that, records can span a good bit more than
just two pages (even more than two segments) and you need to decide for every
of those whether it has a long or a short header.

Capsulated in a function, that's not too bad. But if we want to make
that simpler, one idea would be to allocate the whole 1st page in each
WAL segment for metadata. That way all the actual xlog pages would hold
the same amount of xlog data.

Its a bit easier then, but you probably still need to loop over the size and
subtract till you reached the final point. Its no problem to produce a 100MB
wal record. But then thats probably nothing to design for.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#12

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Andres Freund (#11)

Re: XLog changes for 9.3

Andres Freund <andres@2ndquadrant.com> writes:

dance. If the record can be smeared over two pages there is no point in
storing it aligned.

I think this is not true. The value of requiring alignment is that you
can read the record-length field without first having to copy it somewhere.
In particular, it will get really ugly if the record length field itself
could cross a page boundary. I think we want to be able to determine
the record length before we do any data copying, so that we can malloc
the record buffer and then just do one copy step.

The real reason for the current behavior of not letting the record
header get split across multiple pages is so that the length field is
guaranteed to be in the first page. We can still guarantee that if
we (1) put the length field first and (2) require at least int32
alignment. I think losing that property will be pretty bad though.

regards, tom lane

#13

Andres Freund

andres@2ndquadrant.com

over 13 years ago

In reply to: Tom Lane (#12)

Re: XLog changes for 9.3

On Thursday, June 07, 2012 06:53:58 PM Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

dance. If the record can be smeared over two pages there is no point in
storing it aligned.

I think this is not true. The value of requiring alignment is that you
can read the record-length field without first having to copy it somewhere.
In particular, it will get really ugly if the record length field itself
could cross a page boundary. I think we want to be able to determine
the record length before we do any data copying, so that we can malloc
the record buffer and then just do one copy step.

Hm, I had assumed the record would get copied into a temp/static buffer first
and only get reassembled together with the data afterwards.
But if thats not the way to go, sure, storing it aligned so that the length
can always be read aligned within a page is sensible.

Andres

#14

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Heikki Linnakangas (#9)

Re: XLog changes for 9.3

On 7 June 2012 17:12, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 07.06.2012 18:51, Simon Riggs wrote:

On 7 June 2012 14:50, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

These changes will help the XLogInsert scaling patch

...and as I'm sure you're aware will junk much of the replication code
and almost certainly set back the other work that we have brewing for
9.3. So this is a very large curve ball you're throwing there.

I don't think this has much impact on what you're doing (although it's a bit
hard to tell without more details). The way WAL records work is the same,
it's just the code that lays them out on a page, and reads back from a page,
that's changed. And that's fairly isolated in xlog.c.

I wasn't worried about the code overlap, but the subsidiary breakage
looks pretty enormous to me.

Anything changing filenames will break every HA config anybody has
anywhere. So you can pretty much kiss goodbye to the idea of
pg_upgrade. For me, this one thing alone is sufficient to force next
release to be 10.0.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15

Andres Freund

andres@2ndquadrant.com

over 13 years ago

In reply to: Simon Riggs (#14)

Re: XLog changes for 9.3

On Thursday, June 07, 2012 07:03:32 PM Simon Riggs wrote:

On 7 June 2012 17:12, Heikki Linnakangas

<heikki.linnakangas@enterprisedb.com> wrote:

On 07.06.2012 18:51, Simon Riggs wrote:

On 7 June 2012 14:50, Heikki Linnakangas

<heikki.linnakangas@enterprisedb.com> wrote:

These changes will help the XLogInsert scaling patch

...and as I'm sure you're aware will junk much of the replication code
and almost certainly set back the other work that we have brewing for
9.3. So this is a very large curve ball you're throwing there.

I don't think this has much impact on what you're doing (although it's a
bit hard to tell without more details). The way WAL records work is the
same, it's just the code that lays them out on a page, and reads back
from a page, that's changed. And that's fairly isolated in xlog.c.

I wasn't worried about the code overlap, but the subsidiary breakage
looks pretty enormous to me.

The xlog arithmetic will still be encapsulated, so not much difference there.
Removing reading of XLogContRecord isn't complicated and would result in less
code. Shouldn't be much more than that.

Anything changing filenames will break every HA config anybody has
anywhere. So you can pretty much kiss goodbye to the idea of
pg_upgrade. For me, this one thing alone is sufficient to force next
release to be 10.0.

Hm? Wal isn't relevant for pg_upgrade. And the HA setups should rely on
archive_command and such and not do computation of the next/last name. I would
guess removing that corner-case actually fixes more tools than it breaks.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#16

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 13 years ago

In reply to: Simon Riggs (#14)

Re: XLog changes for 9.3

Simon Riggs <simon@2ndQuadrant.com> wrote:

Anything changing filenames will break every HA config anybody has
anywhere.

It will impact our scripts related to backup and archiving, but I
think we're talking about two or three staff days to cover it in our
shop.

We should definitely make sure that this change is conspicuously
noted. The scariest part is that there will now be files that
matter with names that previously didn't exist, so lack of action
will cause failure to capture a usable backup. I don't know that it
merits a bump to 10.0, though. We test every backup for usability,
as I believe any shop should; failure to cover this should cause
pretty obvious errors pretty quickly if you are testing your
backups.

-Kevin

#17

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Kevin Grittner (#16)

Re: XLog changes for 9.3

On Thu, Jun 7, 2012 at 1:15 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Simon Riggs <simon@2ndQuadrant.com> wrote:

Anything changing filenames will break every HA config anybody has
anywhere.

It will impact our scripts related to backup and archiving, but I
think we're talking about two or three staff days to cover it in our
shop.

We should definitely make sure that this change is conspicuously
noted. The scariest part is that there will now be files that
matter with names that previously didn't exist, so lack of action
will cause failure to capture a usable backup.

But if you're just using regexp matching against pathnames, your tool
will be just fine. Do your tools actually rely on the occasional
absence of a file in what would otherwise be the usual sequence of
files?

...Robert

#18

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 13 years ago

In reply to: Robert Haas (#17)

Re: XLog changes for 9.3

Robert Haas <robertmhaas@gmail.com> wrote:

But if you're just using regexp matching against pathnames, your
tool will be just fine. Do your tools actually rely on the
occasional absence of a file in what would otherwise be the usual
sequence of files?

To save "snapshot" backups for the long term, we generate a list of
the specific WAL files needed to reach a consistent recovery point
from a given base backup. We keep monthly snapshot backups for a
year. We currently determine the first and last file needed, and
then create a list of all the WAL files to save. We error out if
any are missing, so we do skip the FF file.

-Kevin

#19

Robert Haas

robertmhaas@gmail.com

over 13 years ago

In reply to: Kevin Grittner (#18)

Re: XLog changes for 9.3

On Thu, Jun 7, 2012 at 1:40 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

But if you're just using regexp matching against pathnames, your
tool will be just fine. Do your tools actually rely on the
occasional absence of a file in what would otherwise be the usual
sequence of files?

To save "snapshot" backups for the long term, we generate a list of
the specific WAL files needed to reach a consistent recovery point
from a given base backup. We keep monthly snapshot backups for a
year. We currently determine the first and last file needed, and
then create a list of all the WAL files to save. We error out if
any are missing, so we do skip the FF file.

OK, I see. Still, I think there are a lot of people who don't do
anything that complex, and won't be affected. But I agree we had
better clearly release-note it as an incompatibility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#20

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Simon Riggs (#14)

Re: XLog changes for 9.3

Simon Riggs <simon@2ndQuadrant.com> writes:

Anything changing filenames will break every HA config anybody has
anywhere.

This seems like nonsense to me. How many external scripts are likely to
know that we skip the FF page? There might be some, but not many.

So you can pretty much kiss goodbye to the idea of pg_upgrade.

And that is certainly nonsense. I don't think pg_upgrade even knows
about this, and if it does we can surely fix it.

For me, this one thing alone is sufficient to force next release to be
10.0.

Huh? We make incompatible changes in major versions all the time.
This one does not appear to me to be worse than many others.

regards, tom lane

#21

Simon Riggs

simon@2ndQuadrant.com

over 13 years ago

In reply to: Tom Lane (#20)

Re: XLog changes for 9.3

On 7 June 2012 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

Anything changing filenames will break every HA config anybody has
anywhere.

This seems like nonsense to me. How many external scripts are likely to
know that we skip the FF page? There might be some, but not many.

If that is the only change in filenames, then all is forgiven.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#22

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Simon Riggs (#21)

Re: XLog changes for 9.3

Simon Riggs <simon@2ndQuadrant.com> writes:

On 7 June 2012 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

This seems like nonsense to me. �How many external scripts are likely to
know that we skip the FF page? �There might be some, but not many.

If that is the only change in filenames, then all is forgiven.

Oh, now I see what you're on about. Yes, I agree that we should
maintain the same formatting of WAL segment file names, even though
it will be rather artificial in the 64-bit-arithmetic world. The
only externally visible change should be the creation of FF-numbered
files where formerly those were skipped.

regards, tom lane

#23

Bruce Momjian

bruce@momjian.us

over 13 years ago

In reply to: Tom Lane (#20)

Re: XLog changes for 9.3

On Thu, Jun 07, 2012 at 02:52:04PM -0400, Tom Lane wrote:

So you can pretty much kiss goodbye to the idea of pg_upgrade.

And that is certainly nonsense. I don't think pg_upgrade even knows
about this, and if it does we can surely fix it.

pg_upgrade doesn't know anything about xlog files --- all its interaction
in that area is through pg_resetxlog and it doesn't look at the xlog
details.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +